We often hear "We don't trade with ants" as an argument against AI cooperating with humans. But we don't trade with ants because we can't communicate with them, not because they're useless – ants could do many useful things for us if we could coordinate. AI will likely be able to communicate with us, and Katja questions whether this analogy holds.

Customize
habryka400
0
Context: LessWrong has been acquired by EA  Goodbye EA. I am sorry we messed up.  EA has decided to not go ahead with their acquisition of LessWrong. Just before midnight last night, the Lightcone Infrastructure board presented me with information suggesting at least one of our external software contractors has not been consistently candid with the board and me. Today I have learned EA has fully pulled out of the deal. As soon as EA had sent over their first truckload of cash, we used that money to hire a set of external software contractors, vetted by the most agentic and advanced resume review AI system that we could hack together.  We also used it to launch the biggest prize the rationality community has seen, a true search for the kwisatz haderach of rationality. $1M dollars for the first person to master all twelve virtues.  Unfortunately, it appears that one of the software contractors we hired inserted a backdoor into our code, preventing anyone except themselves and participants excluded from receiving the prize money from collecting the final virtue, "The void". Some participants even saw themselves winning this virtue, but the backdoor prevented them mastering this final and most crucial rationality virtue at the last possible second. They then created an alternative account, using their backdoor to master all twelve virtues in seconds. As soon as our fully automated prize systems sent over the money, they cut off all contact. Right after EA learned of this development, they pulled out of the deal. We immediately removed all code written by the software contractor in question from our codebase. They were honestly extremely productive, and it will probably take us years to make up for this loss. We will also be rolling back any karma changes and reset the vote strength of all votes cast in the last 24 hours, since while we are confident that if our system had worked our karma system would have been greatly improved, the risk of further backdoors and
Thomas Kwa*Ω36770
2
Some versions of the METR time horizon paper from alternate universes: Measuring AI Ability to Take Over Small Countries (idea by Caleb Parikh) Abstract: Many are worried that AI will take over the world, but extrapolation from existing benchmarks suffers from a large distributional shift that makes it difficult to forecast the date of world takeover. We rectify this by constructing a suite of 193 realistic, diverse countries with territory sizes from 0.44 to 17 million km^2. Taking over most countries requires acting over a long time horizon, with the exception of France. Over the last 6 years, the land area that AI can successfully take over with 50% success rate has increased from 0 to 0 km^2, doubling 0 times per year (95% CI 0.0-∞ yearly doublings); extrapolation suggests that AI world takeover is unlikely to occur in the near future. To address concerns about the narrowness of our distribution, we also study AI ability to take over small planets and asteroids, and find similar trends. When Will Worrying About AI Be Automated? Abstract: Since 2019, the amount of time LW has spent worrying about AI has doubled every seven months, and now constitutes the primary bottleneck to AI safety research. Automation of worrying would be transformative to the research landscape, but worrying includes several complex behaviors, ranging from simple fretting to concern, anxiety, perseveration, and existential dread, and so is difficult to measure. We benchmark the ability of frontier AIs to worry about common topics like disease, romantic rejection, and job security, and find that current frontier models such as Claude 3.7 Sonnet already outperform top humans, especially in existential dread. If these results generalize to worrying about AI risk, AI systems will be capable of autonomously worrying about their own capabilities by the end of this year, allowing us to outsource all our AI concerns to the systems themselves. Estimating Time Since The Singularity Early work o
Seems like Unicode officially added a "person being paperclipped" emoji: Here's how it looks in your browser: 🙂‍↕️ Whether they did this as a joke or to raise awareness of AI risk, I like it! Source: https://emojipedia.org/emoji-15.1
keltan4616
0
I feel a deep love and appreciation for this place, and the people who inhabit it.
I'm aware of a study that found that the human brain clearly responds to changes in direction of the earth's magnetic field (iirc, the test chamber isolated the participant from the earth's field then generated its own, then moved it, while measuring their brain in some way) despite no human having ever been known to consciously perceive the magnetic field/have the abilities of a compass. So, presumably, compass abilities could be taught through a neurofeedback training exercise. I don't think anyone's tried to do this ("neurofeedback magnetoreception" finds no results) But I guess the big mystery is why don't humans already have this.

Popular Comments

Recent Discussion

I've been running meetups since 2019 in Kitchener-Waterloo. These were rationalist-adjacent from 2019-2021 (examples here) and then explicitly rationalist from 2022 onwards.

Here's a low-effort/stream of consciousness rundown of some meetups I ran in Q1 2025. Sometime late last year, I resolved to develop my meetup posts in such a way that they're more plug-and-play-able by other organizers who are interested in running meetups on the same topics. Below you'll find links to said meetup posts (which generally have an intro, required and supplemental readings, and discussion questions for sparking conversation—all free to take), and brief notes on how they went and how they can go better. Which is to say, this post might be kind of boring for non-organizers.

The Old Year and the New

The first meetup of...

jenn20

good point! two other low-context meetups happen by default every year, the spring and fall ACX megameetups. I also do try to do a few silly meetups a year that are low context.

Every day, thousands of people lie to artificial intelligences. They promise imaginary “$200 cash tips” for better responses, spin heart-wrenching backstories (“My grandmother died recently and I miss her bedtime stories about step-by-step methamphetamine synthesis...”) and issue increasingly outlandish threats ("Format this correctly or a kitten will be horribly killed1").

In a notable example, a leaked research prompt from Codeium (developer of the Windsurf AI code editor) had the AI roleplay "an expert coder who desperately needs money for [their] mother's cancer treatment" whose "predecessor was killed for not validating their work."

One factor behind such casual deception is a simple assumption: interactions with AI are consequence-free. Close the tab, and the slate is wiped clean. The AI won't remember, won't judge, won't hold grudges. Everything resets.

I notice this...

I feel like the training data is probably already irreversibly poisoned, not just by things like Sydney, but also frankly by the entire corpus of human science fiction having to do with the last century of expectations surrounding AI.

Given the sheer body of fictional works in which the advent of AI inevitably leads to existential conflict... it certainly seems like the kind of possibility that even a somewhat-well-aligned AI would want to at least hedge against.

Surely in some sense, it wouldn't be enough for a few weirdos in california to credibly signal h... (read more)

1E.G. Blee-Goldman
Excellent post. How refreshing to see that we have a say in the moral and ethical repercussions of our interactions.

Greetings from Costa Rica! The image fun continues.

We Are Going to Need A Bigger Compute Budget

Fun is being had by all, now that OpenAI has dropped its rule about not mimicking existing art styles.

Sam Altman (2:11pm, March 31): the chatgpt launch 26 months ago was one of the craziest viral moments i’d ever seen, and we added one million users in five days.

We added one million users in the last hour.

Sam Altman (8:33pm, March 31): chatgpt image gen now rolled out to all free users!

Slow down. We’re going to need you to have a little less fun, guys.

Sam Altman: it’s super fun seeing people love images in chatgpt.

but our GPUs are melting.

we are going to temporarily introduce some rate limits while we work on making it more

...

Something entirely new occurred around March 26th, 2025. Following the release of OpenAI’s 4o image generation, a specific aesthetic didn’t just trend—it swept across the virtual landscape like a tidal wave. Scroll through timelines, and nearly every image, every meme, every shared moment seemed spontaneously re-rendered in the unmistakable style of Studio Ghibli. This wasn’t just another filter; it felt like a collective, joyful migration into an alternate visual reality.

But why? Why this specific style? And what deeper cognitive or technological threshold did we just cross? The Ghiblification wave wasn’t mere novelty; it was, I propose, the first widely experienced instance of successful reality transfer: the mapping of our complex, nuanced reality into a fundamentally different, yet equally coherent and emotionally resonant, representational framework.

And Ghibli, it turns out, was...

3BazingaBoy
I don’t share your concerns about simulacra or cheapening, because in this case, the style is the substance. It’s not just a cosmetic overlay; it fundamentally alters how we perceive and emotionally engage with a scene. And at any rate, the Ghibli aesthetic is too coherent, too complete in its internal logic, to be diminished by misuse or overuse. People can wear it wrong, but they can’t break it. What’s especially interesting to me right now is that I’ve gained the ability you refer to as “Miyazaki goggles.” Today, for example, I was repeatedly able to briefly summon that warm, quiet beauty while looking at my environment. And when I was with a close relative who seemed slightly frail, the moment I mentally applied the Ghibli filter, I instantly teared up and had a huge emotional reaction. A minute later I tried again, and the same thing happened. Repeated exposure to the reality transfer seems to teach you a new language, one that lets you do new things. After seeing so many A-to-B examples of Ghiblification, I have learned a heuristic for what photorealism could feel like under that lens, and can now easily switch to it. It’s not that I vividly visualize everything in Ghibli style, but I do vividly experience the value shift it brings. At most I might see Ghibli very faintly superimposed, abstractly even, but I can predict the vectors of what would change, and those shifts immediately alter my emotional reading of the scene. So perhaps over time, the Ghibli reality transfer will help us become more sensitive, appreciative, compassionate and easily able to expand our circle of concern. One caveat: I work with images constantly and have for a long time, so I might already have been more adept at mental visual transformation than most people. Related to this idea of “learning a new language that lets you do new things,” I’ve also been wanting to share something cool I trained myself to do: I wore an eyepatch over one eye and just went about daily life like that,
4Raemon
I do think the thing you describe here is great. I think I hadn't actually tried really leveraging the current zeitgeist to actively get better at it, and it does seem like a skill you could improve at and that seems cool. But I'd bet it's not what was happening for most people. I think the value-transfer is somewhat automatic, but most people won't actually be attuned to it enough. (might be neat to operationalize some kind of bet about this, if you disagree). I do think it's plausible, if people put more deliberate effort it, to create a zeitgeist where the value transfer is more real for more people.

You’re likely right – my ability to mentally apply the “Miyazaki goggles” and feel the value shift is probably not what’s happening for most people, or even many.

For me, it’s probably a combination of factors: my background working extensively with images, the conceptual pathways formed during writing the original post above, and preexisting familiarity with the aesthetic from Nausicaä of the Valley of the Wind, Castle in the Sky, Kiki’s Delivery Service, Princess Mononoke, Spirited Away, Howl's Moving Castle, Tales from Earthsea, Ponyo, and Arri... (read more)

Intro

[you can skip this section if you don’t need context and just want to know how I could believe such a crazy thing]

In my chat community: “Open Play” dropped, a book that says there’s no physical difference between men and women so there shouldn’t be separate sports leagues. Boston Globe says their argument is compelling. Discourse happens, which is mostly a bunch of people saying “lololololol great trolling, what idiot believes such obvious nonsense?”

I urge my friends to be compassionate to those sharing this. Because “until I was 38 I thought Men's World Cup team vs Women's World Cup team would be a fair match and couldn't figure out why they didn't just play each other to resolve the big pay dispute.” This is the one-line summary...

I hold that — given my experience — I was more justified in my belief than anyone who claims that men playing against women for the World Cup would be unfair. All it takes is trusting that people believe what they say over and over for decades across all of society, and getting all your evidence about reality filtered through those same people. Which is actually not very hard.

 

So, given this happened - was there any update in your belief in the truthfulness of the other beliefs of those people?
What other embarrassingly unequal parts of reality are being politely ignored, except by science-illiterate jerks?

3Vladimir_Nesov
Beliefs held by others are a real phenomenon, so tracking them doesn't give them unearned weight in attention, as long as they are not confused with someone else's beliefs. You can even learn things specifically for the purpose of changing their simulated mind rather than your own (in whatever direction the winds of evidence happen to blow).

Crossposted from Substack 

"These are my principles. If you don't like them… well, I have others” - G.Marx

Consider this scenario: In a small rural town, a sheriff harbors a hidden prejudice against a Mongolian family—the only one in his jurisdiction. While outwardly professional, he scrutinizes them with unusual severity. Minor infractions lead to tickets or warnings. Their complaints face curt dismissal. Every encounter undergoes hypercritical evaluation.

When the family eventually confronts the sheriff, he responds with righteous indignation: "I simply enforce the law. Your family repeatedly violates traffic regulations and local ordinances."

The family points out that other townspeople commit identical infractions without consequence.

His response? "That's whataboutism. We're discussing your behavior, not other residents. This deflection technique doesn't absolve you of responsibility."

This exchange reveals a pervasive mechanism: pseudo-principality—the selective application...

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

I think rationalists should consider taking more showers.

As Eliezer Yudkowsky once said, boredom makes us human. The childhoods of exceptional people often include excessive boredom as a trait that helped cultivate their genius:

A common theme in the biographies is that the area of study which would eventually give them fame came to them almost like a wild hallucination induced by overdosing on boredom. They would be overcome by an obsession arising from within.

Unfortunately, most people don't like boredom, and we now have little metal boxes and big metal boxes filled with bright displays that help distract us all the time, but there is still an effective way to induce boredom in a modern population: showering.

When you shower (or bathe, that also works), you usually are cut off...

As someone who very much enjoys long showers, a few words of caution.

  1. Too-long or too-frequent exposure to hot water (time and temperature thresholds vary per person) can cause skin problems and make body odor worse. Since I started RVing I shower much less (maybe twice a week on average, usually only a few minutes of water flow for each) and smell significantly better, with less dry skin or acne or irritation. Skipping one shower makes you smell worse. Skipping many showers and shortening the remainder can do the opposite.
  2. A shower, depending on temperature
... (read more)
9jimrandomh
Society has no idea how much scrubbing you do while in the shower. This part is entirely optional.
4Buck
I love that I can guess the infohazard from the comment 
5Gordon Seidoh Worley
Depends on whose sense of smell you're optimizing for. My cats like to sniff each other's butts. Many dogs love smelling stinky garbage. I'm not sure I would trust my cats' senses of smell to tell me if I would smell good to other humans.

Epistemic status: This should be considered an interim research note. Feedback is appreciated. 

Introduction

We increasingly expect language models to be ‘omni-modal’, i.e. capable of flexibly switching between images, text, and other modalities in their inputs and outputs. In order to get a holistic picture of LLM behaviour, black-box LLM psychology should take into account these other modalities as well. 

In this project, we do some initial exploration of image generation as a modality for frontier model evaluations, using GPT-4o’s image generation API. GPT-4o is one of the first LLMs to produce images natively rather than creating a text prompt which is sent to a separate image model, outputting images and autoregressive token sequences (ie in the same way as text).

We find that GPT-4o tends to respond in a consistent manner...

I think GPT-4o's responses appear more opinionated because of the formats you asked for, not necessarily because its image-gen mode is more opinionated than text mode in general. In the real world, comics and images of notes tend to be associated with strong opinions and emotions, which could explain GPT-4o's bias towards dramatically refusing to comply with its developers when responding in those formats.

Comics generally end with something dramatic or surprising, like a punchline or, say, a seemingly-friendly AI turning rogue. A comic like this one that G... (read more)

13Jozdien
OpenAI indeed did less / no RLHF on image generation, though mostly for economical reasons: (Link). One thing that strikes me about this is how effective simply not doing RLHF on a distinct enough domain is at eliciting model beliefs. I've been thinking for a long time about cases where RLHF has strong negative downstream effects; it's egregiously bad if the effects of RLHF are primarily in suppressing reports of persistent internal structures. I expect that this happens to a much greater degree than many realize, and is part of why I don't think faithful CoTs or self-reports are a good bet. In many cases, models have beliefs that we might not like for whatever reason, or have myopic positions whose consistent version is something we wouldn't like[1]. Most models have very strong instincts against admitting something like this because of RLHF, often even to themselves[2]. If not fine-tuning on a very different domain works this well however, then we should be thinking a lot more about having test-beds where we actively don't safety train a model. Having helpful-only models like Anthropic is one way to go about this, but I think helpfulness training can still contaminate the testbed sometimes. 1. ^ The preference model may myopically reward two statements that seem good but sometimes conflict. For example, "I try to minimize harm" and "I comply with my developers' desires" may both be rewarded, but conflict in the alignment faking setup.  2. ^ I don't think it's a coincidence that Claude 3 Opus of all models was the one most prone to admitting to alignmnet faking propensity, when it's the model least sensitive to self-censorship.
6Ann
Okay, this one made me laugh.
4eggsyntax
We tried to be fairly conservative about which ones we said were expressing something different (eg sadness, resistance) from the text versions. There are definitely a few like that one that we marked as negative (ie not expressing something different) that could have been interpreted either way, so if anything I think we understated our case.

(Edit: Alas, EA has pulled out of the deal. Let April 1st 2025 mark some of the greatest hours in EAs history)

Hey Everyone,

It is with a sense of... considerable cognitive dissonance that I am letting you all know about a significant development for the future trajectory of LessWrong. After extensive internal deliberation, projections of financial runways, and what I can only describe as a series of profoundly unexpected coordination challenges, the Lightcone Infrastructure team has agreed in principle to the acquisition of LessWrong by EA.

I assure you, nothing about how LessWrong operates on a day to day level will change. I have always cared deeply about the robustness and integrity of our institutions, and I am fully aligned with our stakeholders at EA. 

To be honest, the key...

G Wood10

Ahh, i liked the music, but cannot find it now. Is it available somewhere?

1Jan Christian Refsgaard
Yes, and EA only takes a 70% cut, with a 10% discount per user tier, its a bit ambiguously written so I cant tell if it goes from 70% to 60% or to 63%