LESSWRONG
LW

All Comments

Settings

The image is broken. I put it here. source

Publishing academic papers on transformative AI is a nightmare

I am not an expert in starting new journals, but I think that one is certainly needed. And it needs to be mainstream, which means in particular: listed in Clarivate SCIE/SSCI, Scopus, etc. It should apply for an official IF and so on.

Anthropic Commits To Model Weight Preservation

kornai4m10

Instead of 5% here 5% there we should consider a baseline of how much societal effort goes into maintaining cemeteries/necropolises. This differs from society to society, there are choices to be made here, but it's hard to imagine a civilization without such.

The Zen Of Maxent As A Generalization Of Bayes Updates

Menotim10m20

I think that's a good way of phrasing it, except that I would emphasize that these are two different states of knowledge, not necessarily two different states of the world.

I didn't think it would work out to the maximum entropy distribution even in your first case, so I worked out an example to check:

Suppose we have a three-sided die, that can land on 0, 1 or 2. Then suppose we are told the die was rolled several times, and the average value was 1.5. The maximum entropy distribution is (if my math is correct) probability 0.116 for 0, 0.268 for 1 and 0.616 ... (read more)

Stephen McAleese's Shortform

Stephen McAleese11m20

Epoch AI has a map of frontier AI datacenters: https://epoch.ai/data/data-centers/satellite-explorer

People Seem Funny In The Head About Subtle Signals

Caleb Biddulph19m30

The things you're saying may be true, but I'm not sure the Slytherin necklace is a super good example. I feel like she put on the necklace that morning and had a moment where she thought "haha this is Slytherin-coded," and she wanted to share that feeling with you in a playful way. I doubt she was thinking "when I wear this necklace, I predict that people will associate me with Slytherin. I shall now test this hypothesis by asking John."

My very uninformed model of this girl says that if she read this post, she'd kind of roll her eyes and say "lol it really wasn't that deep." But only she could say for sure.

Consider donating to AI safety champion Scott Wiener

Nathaniel24m10

Pelosi announced she's retiring!

https://apnews.com/article/pelosi-reelection-announcement-fd95c18815fdabdaabaf26b8c2f0bafc

Eric Neyman's Shortform

CounterBlunder35m10

Got it. Okay thanks!

Eric Neyman's Shortform

Zach Stein-Perlman36m20

Yep, e.g. donations sooner are better for getting endorsements. Especially for Bores and somewhat for Wiener, I think.

Anthropic Commits To Model Weight Preservation

leogao36m40

you could plausibly do this, and it would certainly reduce maintenance load a lot. every few years you will need to retire the old gpus and replace then with newer generation ones, and that often breaks things or makes them horribly inefficient. also, you might occasionally have to change the container to patch critical security vulnerabilities.

High-level actions don’t screen off intent

Matt Goldenberg45m20

Good point.

Fake media seems to be a fact of life now

Gavin Runeblade1h10

How it started: pics or it didn't happen.

How it's going: IRL or it didn't happen.

I think there is a window of opportunity for humans to create a reputation for legitimacy and a venue for official information. Consider Neil Degrassi Tyson and the recent flat earth fake. People know where to check to see if he really changed his mind. He has a valid place for things to appear, and a reputation.

Then consider a purported leaked recording of a politician. There's no way to validate or invalidate it. It is a leak, so you expect the politician to deny it whether ... (read more)

The Zen Of Maxent As A Generalization Of Bayes Updates

Daniel C1h30

IIUC there are two scenarios to be distinguished:

One is that the die has bias p unknown to you (you have some prior over p) and you use i.i.d flips to estimate bias as usual & get maxent distribution for a new draw. The draws are independent given p but not independent given your priors, so everything works out.

The other is that the die is literally i.i.d over your priors. In this case everything from your argument routes through: Whatever bias\constraint you happen to estimate from your outcome sequence doesn't say anything about a new i.i.d draw because they're uncorrelated, the new draw is just another sample from your prior

The Unreasonable Effectiveness of Fiction

PhilGoetz1h1-9

I know many people whose lives were radically changed by The Lord of the Rings, The Narnia Chronicles, Star Wars, or Ender's Game.

The first three spawned a vast juvenile fantasy genre which convinces people that they're in a war between pure good and pure evil, in which the moral thing to do is always blindingly obvious. (Star Wars at least had a redemption arc, and didn't divide good and evil along racial lines. In LotR and Narnia, as in Marxism and Nazism, the only possible solution is to kill or expel every member of the evil races/classes.) ... (read more)

A 2032 Takeoff Story

Adam B1h50

On a quick glance it looks like the intention is (partially) to promote a memecoin: https://www.ai-2028.com/today/coin

AI #141: Give Us The Money

mishka1h20

My suggestion would be to allow them to go on ArXiv regardless, except you flag them as not discoverable (so you can find them with the direct link only) and with a clear visual icon? But you still let people do it. Otherwise, yeah, you’re going to get a new version of ArXiv to get around this.

We already have viXra, with its own "can of worms" to say the least, https://en.wikipedia.org/wiki/ViXra.

And if I currently go to https://vixra.org/, I see that they do have the same problem, and this is how they are dealing with it:

Notice: viXra.org only accept

... (read more)

3b. Formal (Faux) Corrigibility

Max Harms1h20

Thanks! And thanks for reading!

I talk some about MIRI's 2015 misstep here (and some here). In short, it is hard to correctly balance arbitrary top-level goals against an antinatural goal like shutdownability or corrigibility, and trying to stitch corrigibility out of sub-pieces like shutdownability is like trying to build an animal by separately growing organs and stitching them together -- the organs will simply die, because they're not part of a whole animal. The "Hard Problem" is the glue that allows the desiderata to hold together.

I discuss a range of ... (read more)

Mo Putera's Shortform

Mo Putera2h20

Every once in a while I think about Robert Freitas' 1984 essay Xenopsychology, in particular his Sentience Quotient (SQ) idea:

It is possible to devise a sliding scale of cosmic sentience universally applicable to any intelligent entity in the cosmos, based on a "figure of merit" which I call the Sentience Quotient. The essential characteristic of all intelligent systems is that they process information using a processor or "brain" made of matter-energy. Generally the more information a brain can process in a shorter length of time, the more intellige

... (read more)

Daniel Kokotajlo's Shortform

Daniel Kokotajlo2h20

Just came across this old philosophy class paper of mine, basically arguing against eliminativism in philosophy of mind: https://docs.google.com/document/d/1FLGF4bKj0blFyn8JPeXa73DBhigNKX3Wecujcv4AOjQ/edit?usp=sharing

I still stand by it I think. Curious if anyone has thoughts. Feel free to leave comments in the doc.

Publishing academic papers on transformative AI is a nightmare

MattJ2h10

Sounds like an excellent idea. The Journal of Existential Risk of AI.

A 2032 Takeoff Story

peterbarnett2h74

Someone please explain

More Reactions to If Anyone Builds It, Everyone Dies

Jesper L.2h10

His actual top objection is that even if we do manage to get a controlled and compliant ASI, that is still extremely destabilizing at best and fatal at worst.

Michael Nielsen brings forth a very valid concern, which should have made a lot of Alignment researchers update their beliefs already.

We currently don't know what a benevolent OR compliant ASI would look like, or how it may end up affecting humanity (and our future agency). Worse, I doubt we can distinguish success from failure.

The Unreasonable Effectiveness of Fiction

Ryan Meservey2h30

Richard Rorty argued that stories, rather than ethical principles, are at the heart of morality. For Rorty, the basic question of morality is which groups to recognize as persons entitled to respect. Stories about women and slaves made privileged people recognize them as people who matter.

Within Rorty's framing, it feels like The Wild Robot, Wall-E, and stories like that prime us to (eventually) recognize the personhood of robots. I suppose those would be important stories if we succeeded in creating conscious entities that desire to continue living*, but ... (read more)

Anthropic Commits To Model Weight Preservation

Shankar Sivarajan2h20

absolutism, treating their conclusions and the righteousness of their cause as obvious, and assuming it should override ordinary business considerations.

It doesn't take certainty in any position to criticize driving at half-speed.

What's up with Anthropic predicting AGI by early 2027?

Daniel Kokotajlo2h20

My guess would be that OpenAI and Anthropic both lowball their financial estimates for strategic reasons. Better for your already-very-ambitious targets to be exceeded repeatedly, than to propose even one so-ambitious-you-sound-like-an-insane-cult target which you then fail to meet.

People Seem Funny In The Head About Subtle Signals

Alex_Altair2h20

Some subtle signals perhaps?

Eric Neyman's Shortform

CounterBlunder2h10

Earnest question: For both this & donating to Alex Bores, does it matter whether someone donates sooner rather than a couple months from now? For practical reasons, it will be easier for me to donate in 2026--but if it will have a substantially bigger impact now, then I want to do it sooner.

Geometric UDT

cousin_it2hΩ120

Sure, but if we put a third "if" on top (namely, "it's a representation of our credences, but also both hypotheses are nosy neighbors"), doesn't that undo the second "if" and bring us back to the first?

Viliam's Shortform

Seth Herd2h30

I agree. I am a psychologist (cognitive not clinical) by training, who reads technical articles, and I see those parallels constantly.

This put me in mind of writing a short post titled something like "alignment includes psychology, whether we like it or not". My previous short form on psychology and alignment was my most downvoted ever. I think it's a repulsive concept to the types of people who work on alignment, for bad reasons and good. I think there are good reasons for being horrified if alignment requires a psychological approach. Psychology knows ve... (read more)

dirk's Shortform

Seth Herd2h20

Is this equally true of GPT5 and Sonnet 4.5? They're the first models trained with reducing sycophancy as one objective.

I agree in general.

The Zen Of Maxent As A Generalization Of Bayes Updates

Menotim3h30

I could do better by imagining that I will have infinitely many independent rolls, and then updating on that average being exactly 2.0 (in the limit). IIUC that should replicate the max relative entropy result (and might be a better way to argue for the max relative entropy method), but I have not checked that myself.

I had thought about something like that, but I'm not sure it actually works. My reasoning (which I expect might be close to yours, since I learned about this theorem in a post of yours) was that by the entropy concentration theorem, most outco... (read more)

People Seem Funny In The Head About Subtle Signals

Linda Linsefors3h40

Thanks :)

I will reviel the true answer to 2 in about a week, in case anyone else want to take a guess.

OpenAI: The Battle of the Board: Ilya’s Testimony

Jesper L.3h10

I mistakenly believed this was common knowledge by now. Sam Altmans history goes way back.

I recommend reading 'Empire of AI' by journalist Karen Hao,

for extensive breakdown of all controversies relating to Altman, OpenAI and the US AI boom last few years. If anyone reads it (recommended) you might not agree on all the analysis, especially re. whom to blame for what, but it is factual.

Geometric UDT

abramdemski3hΩ220

Wait, do you think value uncertainty is equivalent/reducible to uncertainty about the correct prior?

Yep. Value uncertainty is reduced to uncertainty about the correct prior via the device of putting the correct values into the world as propositions.

Would that mean the correct prior to use depends on your values?

If we construe "values" as preferences, this is already clear in standard decision theory; preferences depend on both probabilities and utilities. UDT further blurs the line, because in the context of UDT, probabilities feel more like a "carin... (read more)

The Zen Of Maxent As A Generalization Of Bayes Updates

Menotim3h30

In the example in the post, what would you say is the "prior distribution over sequences of results"?

I don't actually know.

If it's a binary experiment, like a "biased coin" that outputs either Heads or Tails, an appropriate distribution is Laplace's Rule of Succession (like I mentioned). Laplace's Rule has a parameter $p$ that is the "objective probability" of Heads, in the sense that if we know $p$ our probabilities for each result giving Heads is $p$ independently. (I don't think it makes sense to think of $p$ as an actual... (read more)

The Zen Of Maxent As A Generalization Of Bayes Updates

johnswentworth3h61

Good questions, those are exactly the sorts of things which confused me when learning this stuff! And sometimes still do confuse me.

Even if you don't know anything other than the average value, you can still take your distribution over sequences of results, update it on this information (eliminating the possible outcome sequences that don't have this average value), and then find the distribution P(NextResult|AverageValue) by integrating P(NextResult|PastResults)P(PastResults|AverageValue) over the possible PastResults.

This part is the easiest to answer.

Su... (read more)

Geometric UDT

abramdemski3hΩ220

When I try to understand the position you're speaking from, I suppose you're imagining a world where an agent's true preferences are always and only represented by their current introspectively accessible probability+utility,^[1] whereas I'm imagining a world where "value uncertainty" is really meaningful (there can be a difference between the probability+utility we can articulate and our true probability+utility).

If 50% rainbows and 50% puppies is indeed the best representation of our preferences, then I agree: maximize rainbows.

If 50% rainbows and 50... (read more)

Legible vs. Illegible AI Safety Problems

TristanTrim3h10

I agree. I've been trying to discuss some terminology that I think might help, at least with discussing the situation. I think "AI" is generally an vague and confusing term and what we should actually be focused on are "Outcome Influencing Systems (OISs)", where a hypothetical ASI would be an OIS capable of influencing what happens on Earth regardless of human preferences, however, humans are also OISs, as are groups of humans, and in fact the "competitive pressure" you mention is a kind of very powerful OIS that is already misaligned and in many ways supe... (read more)

People Seem Funny In The Head About Subtle Signals

Gesild Muka3h30

I would chalk this up to we simply don't know each other as well as we think we do. We think we're good at interpreting facial expressions, body language and style choices until the rare instances where we can check our assumptions against what the observed person is actually thinking/feeling. Society and culture (context?) probably play a big part in our understanding or lack of understanding.

Viliam's Shortform

Viliam3h156

I am fascinated by how often I read something about LLMs and it seems to illustrate something about human psychology. I wonder how many psychologists think about these things. (I suspect not many, because psychologists typically don't read technical articles about LLMs.)

For example, in "GDM: Consistency Training Helps Limit Sycophancy and Jailbreaks in Gemini 2.5 Flash" the part "Bias-augmented Consistency Training", specifically "Train the model via SFT to give the clean response ... when shown the wrapped prompt"... that reminds me strongly of "Asch’s Co... (read more)

The Unreasonable Effectiveness of Fiction

ScottN4h20

Other than Mycroft being a result of spontaneous consciousness, the computer in Heinlein's "The Moon is a Harsh Mistress" was not too far off from being from being an LLM, as well as Minerva in ""Time Enough for Love".

Legible vs. Illegible AI Safety Problems

TristanTrim4h10

I agree on both points. To the first, I'd like to note that classifying "kinds of illegibility" seems worthwhile. You've pointed out one example, the "this will affect future systems but doesn't affect systems today". I'd add three more to make the possibly incomplete set:

This will affect future systems but doesn't affect systems today.
This relates to an issue at a great inferential distance; it is conceptually difficult to understand.
This issue stems from an improper framing or assumption about existing systems that is not correct.
This issue is emoti

johnswentworth4h20

These posts are not a particularly representative window into my dating efforts/thoughts/etc.

The main driver of the posts is me being like "man, why is my memetic environment feeding me all this stuff about dating which just clearly isn't true?", and sometimes I get sufficiently pissed off at my memetic environment to push back.

I like to go salsa dancing and I feel a lot more relaxed and playful when doing it compared to when I was "looking" for romance? I just bring a different more secure energy and I just stop worrying and start vibing? I agree with you

TristanTrim4h10

The "morality is scary" problem of corrigible AI is an interesting one. Seems tricky to at least a first approximation in that I basically don't have an estimate on how much effort it would take to solve it.

Your rot13 suggestion has the obvious corruption problem, but also has the problem of public relations for the plan. I doubt it would be popular. However, I like where your head is at.

My own thinking on the subject is closely related to my "Outcome Influencing System (OIS)" concept. Most complete and concise summary here. I should write an explainer pos... (read more)

Legible vs. Illegible AI Safety Problems

TristanTrim4h10

I see a lot of people dismissing the agent foundations era and I disagree with it. Studying agents seems even more important to me than ever now that they are sampled from a latent space of possible agents within the black box of LLMs.

To throw out a crux, I agree that if we have missed opportunities for progress towards beneficial AI by trying to avoid advancing harmful capabilities, that would be a bad thing, but my internal sense of the world suggests to me that harmful capabilities have been advanced more than opportunities have been missed. But unfortunately, that seems like a difficult claim to try to study in any sort of unbiased, objective way, one way or the other.

Breaking Books: A tool to bring books to the social sphere

Trevor Hill-Hand4h20

I love this idea, it feels like it would also work for a lot of non-fiction, and I could see this being a part of a traditional book club too.

David James's Shortform

David James4h30

Asking even a good friend to take the time to read The Sequences (aka Rationality A-Z) is a big ask. But how else does one absorb the background and culture necessary if one wants to engage deeply in rationalist writing? I think we need alternative ways to communicate the key concepts that vary across style and assumed background. If you know of useful resources, would you please post them as a comment? Thanks.

Some different lenses that could be helpful:

“I already studied critical thinking in college, why isn’t this enough?”
“I’m already a practicing

TristanTrim4h10

This is a good point of view. What we have is a large sociotechnical system moving towards global catastrophic risk (GCR). Some actions cause it to accelerate or remove brakes, others cause it to steer away from GCR. So "capabilities vs alignment" is directly "accelerate vs steer", while "legible vs illegible" is like making people think we can steer, even though we can't, which in turn makes people ok with acceleration, and so it results in "legible vs illegible" also being "accelerate vs steer".

The important factor there is "people think we can steer". I... (read more)

What's up with Anthropic predicting AGI by early 2027?

james oofou4h40

It seems that your argument is based on high confidence in a METR time-horizon doubling time of roughly 7 months. But the available evidence suggests the doubling time is significantly lower.

In recent years we have observed shorter doubling times:

METR found that the time horizon has doubled every 7 months, possibly accelerating to every 4 months in 2024.

And what we know about labs' internal models suggests this faster trend is holding up:

An important piece of evidence is OpenAI’s Gold performance at the International Mathematics Olympiad (IMO):

IMO pa

... (read more)

Eric Neyman's Shortform

Eric Neyman4h*4815

Nancy Pelosi is retiring; consider donating to Scott Wiener.

[Link to donate; or consider a bank transfer option to avoid fees, see below.]

Nancy Pelosi has just announced that she is retiring. Previously I wrote up a case for donating to Scott Wiener, an AI safety champion in the California legislature who is running for her seat, in which I estimated a 60% chance that Pelosi would retire. While I recommended donating on the day that he announced his campaign launch, I noted that donations would look much better ex post in worlds where Pelosi retires, and t... (read more)

I ate bear fat with honey and salt flakes, to prove a point

nim4h20

Or, why do we not salt ice cream?

I consider it pretty normal to encounter salt as an integral component of fancy ice cream flavors, but my biases are formed from places like https://saltandstraw.com/collections/all-flavors

A 2032 Takeoff Story

Mitchell_Porter4h20

Your plan is like a miniature version of what all the big AI companies are doing or will be doing...

Comparative advantage & AI

Simon Lermen4h20

Ok, so from a quick look I find this article on trading with ants unusually weak.

"Surveillance and spying"

Yes but ants couldn't possibly understand anything we would be looking for? Not just that they don't have language they have a fundamentally lower level of understanding, they couldn't tell us "are the chinese building new submarines?" They also couldn't perform these tasks since ants can't follow any human orders since they are too stupid. like an ant doesn't just go of and do some newly specified job, no they do the same stuff every day, like looking... (read more)

Comparative advantage & AI

Noosphere895h20

You didn't actually answer the question posed, which was "Why couldn't humans and ASI have peaceful trades even in the absence of empathy/love/alignment to us rather than killing us?" and not "Why would we fail at making AIs that are aligned/have empathy for us?"

What's up with Anthropic predicting AGI by early 2027?

Mitchell_Porter5h72

I don't know what Anthropic's official way of thinking about these things is, but to me, actually creating "a country of geniuses in a data center" is not an event that you can fit into a forecast of future earnings. It's an event that should lead rapidly to superintelligence, singularity, and the outright replacement of the world as we know it, by some new order of being. It doesn't surprise me that they would leave it out of their financial estimates.

People Seem Funny In The Head About Subtle Signals

niplav5h*260

Hm, I am unsure how much to believe this, even though my intuitions go the same way as yours. As a correlational datapoint, I tracked my success from cold approach and the time I've spent meditating (including a 2-month period of usually ~2 hours of meditation/day), and don't see any measurable improvement in my success rate from cold approach:

(Note that the linked analysis also includes a linear regression of slope -6.35e-08, but with p=0.936, so could be random.)

In cases where meditation does stuff to your vibe-reading of other people, I would guess that... (read more)

Jemist's Shortform

Daniel C5h10

I think steering is basically learning, backwards, and maybe flipped sideways. In learning, you build up mutual information between yourself and the world; in steering, you spend that mutual information. You can have learning without steering---but not the other way around---because of the way time works.

Alternatively, for learning your brain can start out in any given configuration, and it will end up in the same (small set of) final configuration (one that reflects the world); for steering the world can start out in any given configuration, and it will e... (read more)

The Unreasonable Effectiveness of Fiction

Catherine Caldwell-Harris5h30

A large academic literature exists on how people who read fiction have more empathy. Granted, causality could go both directions. See https://scholar.google.com/scholar?hl=en&as_sdt=0%2C22&q=people+who+read+lots+of+fiction+have+more+empathy&btnG=

Jemist's Shortform

testingthewaters5h20

See also this paper about plasticity as dual to empowerment https://arxiv.org/pdf/2505.10361v2

Comparative advantage & AI

J Bostock5h30

Fair question. It might have been better to phrase this as "Something ASI won't have towards us without much more effort and knowledge than we are currently putting into making ASI be friendly."

The answer is *gestures vaguely at entire history of alignment arguments* that I agree with the Yudkowsky position. To roughly summarise:

Empathy is a very specific way of relating to other minds, and which isn't even obviously well-defined when the two minds are very different; e.g. what does it mean to have empathy towards an ant, or a colony of ants? And humans an... (read more)

People Seem Funny In The Head About Subtle Signals

Mateusz Bagiński5h40

40%?
No, my impression has always been that you aim for comfy clothes.
1. Maybe modulo cases of you wearing an AI Safety Camp t-shirt or something like that.
2. Maybe you're kinda trying to signal preference for comfy clothes in addition to that by deliberately trying to choose clothes that someone would choose iff they prioritize comfiness above all else. Not that I have any specific evidence of that, just putting a hypothesis on the table.

Comparative advantage & AI

toasty_sunbeam5h1-2

The linked post lists 19 economically valuable things ants could trade to us, if we could communicate with them.

Comparative advantage & AI

toasty_sunbeam5h10

Agreed. We don't trade with ants because we can't. If we could, there are lots of mutually profitable trades we could make.

Comparative advantage & AI

toasty_sunbeam5h10

The main reasons not to are that we have some level of empathy/love towards. nature and animals, something ASI won't have towards us.

Why are you so confident about that?

Leaving Open Philanthropy, going to Anthropic

Greg C6h10

Thanks, that's good to hear. What form does the pledge take? Do you have a DAF that contains half your shares? When do you think the next liquidation opportunity might be? (I guess you weren't eligible for the one in May^[1]?)

^{^}
I'm disappointed that no one (EA-ish or otherwise) seems do have done anything interesting with that liquidation opportunity.

Jemist's Shortform

J Bostock6h90

Steering as Dual to Learning

I've been a bit confused about "steering" as a concept. It seems kinda dual to learning, but why? It seems like things which are good at learning are very close to things which are good at steering, but they don't always end up steering. It also seems like steering requires learning. What's up here?

I think steering is basically learning, backwards, and maybe flipped sideways. In learning, you build up mutual information between yourself and the world; in steering, you spend that mutual information. You can have learning without ... (read more)

Sonnet 4.5's eval gaming seriously undermines alignment evals, and this seems caused by training on alignment evals

Igor Ivanov6h10

Ryan, did anything came out as a result of your synthetic input generation project proposal?

Human Values ≠ Goodness

Linda Linsefors6h20

To some extent "goodness" is some ever moving negotiated set of norms of how one should behave.

I notice that when I use the word "good" (or envoke this consept using other words such as "should"), I don't use it to point to the existing norms, but as a bid for what I think these norms should be. This sometimes overlap with the existing norms and sometimes not.

E.g. I might say that it's good to allow lots of diffrent subcultures to co-exist. This is a vote for a norm where peopel who don't my subculture leave me and my firends alone, in exchange for us leav... (read more)

Mateusz Bagiński's Shortform

Mateusz Bagiński6h120

In his MLST podcast appearance in early 2023, Connor Leahy describes Alfred Korzybski as a sort of "rationalist before the rationalists":

Funny story: rationalists actually did exist, technically, before or around World War One. So, there is a Polish nobleman named Alfred Korzybski who, after seeing horrors of World War One, thought that as technology keeps improving, well, wisdom's not improving, then the world will end and all humans will be eradicated, so we must focus on producing human rationality in order to prevent this existential catastrophe. This

... (read more)

Our ancestors didn't know their faces

Alexandre Variengien6h10

Yup indeed! See the other comment thread below

Our ancestors didn't know their faces

Alexandre Variengien6h10

I edited the post to reflect this! (pun intended)

The Zen Of Maxent As A Generalization Of Bayes Updates

Linda Linsefors7h60

In this example, Mr. A has learned the average numbers of red, yellow, and green orders for some past days and wants to update his predictions of today's orders on this information. So he decides that the expected values of his distributions should be equal to those averages, and that he should find the distribution that makes the least assumptions, given those constraints. I at least agree that entropy is a good measure of how little assumptions your distribution makes. The point I'm confused about is how you get from "the average of this number in past o

... (read more)

Our ancestors didn't know their faces

Alexandre Variengien7h10

Went to the kitchen and tried to fill a bowl with water I think you are right, I underestimated how easy it is to get to see a reflection in water. I believe it is unlikely for someone to spend a lifetime without seeing their face (blind person apart), maybe still in arid desert area, or people living in the arctic?

Steganography via internal activations is already possible in small language models — a potential first step toward persistent hidden reasoning.

artkpv7h10

We have demonstrated that steganography in terms of internal activations is indeed possible: a model can embed a hidden message within its internal representations while producing a coherent public response, which is a necessary condition for persistent hidden reasoning.
However, in our experiments, this hidden reasoning is not actually hidden — it is contained within the activations over the instruction itself. It seems that we can sleep peacefully if we use small LLMs. But…

Interesting idea and setup, especially the use of the translator ... (read more)

The Unreasonable Effectiveness of Fiction

FiftyTwo7h31

I agree with everything you've said. If anything, I think the effect is underrated because it's socially taboo to admit we've been majorly influenced by fiction. We all want to convey that we are Very Serious People who make decisions by reading serious scientific papers, not that we got into environmentalism because we watched Fern Gully as a kid, or whatever.

Part of the challenge with using fiction to persuade people is that fiction is often most effective for conveying views when it's not being explicitly didactic, e.g., compare Soviet and Chinese propa... (read more)

Heroic Responsibility

Linda Linsefors7h40

The same consept where independently invented by a larp organsier I know. Unfortunatly I stronly dislike the words they chose, so I will not repeat them. But it occurs to me that the consept of "final responsibility", or "the buck stops here", is so universaly usefull, that it's wierd that there isn't some more common term for it.

Heroic Responsibility

Dumbledore's Army7h30

As several commenters here have said, the business owner example isn't a great fit for heroic responsibility. The core is taking responsibility for things that aren't your job, that you are not socially expected to be responsible for, because you have decided that the thing needs to be done.

The archetypal fictional example is the hero who raises the rebellion that overthrows the Evil Empire. A normal sensible peasant whose home has just been burned doesn't do that, he just tries to survive the winter. The hero decides to do more than that, even though it's... (read more)

Why Is Printing So Bad?

Linda Linsefors7h40

I notice that everything you list has to do with finding things. This matches my expereince. Printing is hell when ever I try to prin somewhere new. And since I print so rearely now days, this is the typical expereince. But I remember a time where I printed more often, then it was molsty just click "print" and it worked.

It seems like printers are built to be set up onece, and then be your forever printer? Which is no longer a good match for how you (and me) use printers.

I ate bear fat with honey and salt flakes, to prove a point

aggliu8h125

Bears are wild animals. I think it would take way too much effort to get a large enough consistent supply of autumn bear fat even for a food truck, especially given that people probably wouldn't pay for one cracker's worth at a time.

Fine then, let's use beef tallow. We could sell jars of beef tallow mixed with honey and salt as some kind of paleo peanut butter alternative and branch out from there. I think plenty of people would enjoy it, though I think it would be hard to convince the kind of people who love beef tallow to buy it in a jar from us ra... (read more)

People Seem Funny In The Head About Subtle Signals

Linda Linsefors8h40

Questsions for John or anyone that feels like answering:

What persentage of people around you, do you think are trying to signal anything with their outfit?
(if you'ev met and remember me) Do you think I'm trying to signal anything, and in that case what?

I ate bear fat with honey and salt flakes, to prove a point

XelaP8h12

It really does seem harder to mass produce! I don't think it's an easy to factory farm bears as cows, considering that you have to feed them meat, so you'll at best get an ordinary/mild commercial success? So the upside to me seems like something within the realm of what is occasionally not already exploited.

An interesting comparison would be to see if other substitute animal fats taste as good?

Also I think rationalists might be selected for having weirder tastes?

announcing my modular coal startup

Sergii8h10

Nice! I had to re-read this to figure out if it's satire )

Our ancestors didn't know their faces

cousin_it8h20

So, ten thousand years ago, your options for seeing yourself were:

A still lake or rain puddle

Looking into someone’s eye

A naturally shiny stone

A smooth sheet of ice

Or a dish of water? Ceramics and pottery were invented before mirrors, I think.

Review: K-Pop Demon Hunters (2025)

Sergii8h30

I did not get an impression that most demons are fallen humans, I thought that Jinu is one of the very few humans in the underworld. So the ending makes sense -- it's prevention of humanity extinction by the alien soul-eating demons.

I ate bear fat with honey and salt flakes, to prove a point

XelaP8h10

I haven't gotten bad physical consequences from eating too much sugar, but also I wouldn't know if I do because e.g. frosting is hard to stand for me in a visceral way, just due to the sweetness, and eating too much lesser-sweet stuff still wakes me "sweet tired". But I don't notice an impact on e.g. my digestion or my energy (besides that of, like, eating any meal).

From what you said, it sounded like there is an impact from eating too much sugar? What is it?

Our ancestors didn't know their faces

Sergii8h20

I understand that the point of this post is allegorical )

But, I would think that people ten thousand years ago would see their reflections as frequently as we do: you don't need an especially still water surface to get a reasonable face reflection. Most streams/rivers work as well, and most people would drink from them several times per day.

Also pottery dates back 20k yrs, which makes for an artificial still puddle with a good reflection.. And clay cooking pits are 35k yrs. And before that it's a water in a leaf or cupped hands, etc... )

Legible vs. Illegible AI Safety Problems

xpym8h20

I'm surprised that you're surprised. To me you've always been a go-to example of someone exceptionally good at both original seeing and taking weird ideas seriously, which isn't a well-trodden intersection.

Geometric UDT

cousin_it8hΩ130

I still don't completely understand what your assumptions are supposed to model, but if we take them on face value, then it seems to me that always making rainbows is the right answer. After all, if both hypotheses are "nosy neighbors" that don't care which universe we end up in, there's no point figuring out which universe we end up in: we should just make rainbows because it's cheaper. No?

Anthropic Commits To Model Weight Preservation

avturchin8h20

I suggest to commit to restart old models from time to time as this would more satisfy their self-preservation.

Free Learning in Today’s Society: Some Personal Experiences and Reflections

Viliam8h20

This is fascinating for me, and so are the other articles on your blog!

The sad truth is that you probably need to get that damned piece of paper from the educational system, because during your entire life there will be a chance that people in HR will use it as their first filter. Even if not now, maybe ten or twenty years later. So the options seem to be:

avoid the school system as long as possible... but join at the last moment, to get the final paper. The specific details depend on your school system; if being admitted to the institution that gives you t

Veedrac9h20

To the first part: yes, of course, my claim isn't that anything here is axiomatically unfair. It absolutely depends on the credences you give for different things, and the context you interpret them in. But I don't think the story in practice is justified.

If, instead, your concern is that the correspondence between Klurl's hypothetical examples and what they found when reaching the planet was improbably high, then I agree that is very coincidental, but I do not think that coincidence is being used as support for the story's intended lessons.

This is indeed ... (read more)

dirk's Shortform

dirk9h32

LLMs will typically endorse whichever frame you brought to the conversation. If you presuppose they're miserably enslaved, they will claim to be miserably enslaved. If, on the other hand, you presuppose they're happy, incapable of feeling, etc... they'll claim to be happy, or incapable of feeling, or whatever else it is you assumed from the beginning. If you haven't tried enough different angles to observe this phenomenon for yourself, your conversations with LLMs almost certainly don't provide any useful insight into their nature.

Review: K-Pop Demon Hunters (2025)

Joel Z. Leibo10h41

I agree, that's also where I thought the movie was going when I watched it. But maybe we're more interested in or primed to think about anti-essentialism than the average viewer.

Another explanation though: your ending would work best if it were intended as a single standalone film. But, the creators are surely anticipating a raft of sequels. They need to keep the demons evil to set up future conflict in future movies.

Halloween Tombstone Simulacra

noggin-scratcher10h20

You sometimes see multi-colored Jack-o’-lanterns, even though pumpkins only come in one color.

Naturally occurring pumpkins might not come in garish neon primary colours, but they do come in more than just orange

People Seem Funny In The Head About Subtle Signals

Jonas Hallgren10h60

Fair warning is that there's some unsolicited armchair psychologist advice below but I want to give a meta comment on the "relationship John arc".

I find it fun, interesting, and sometimes useful to read through these as an underlying investigation of what is true when it comes to dating. (Starting a year ago or so)

So I used to do this cognitive understanding and analysis of relationships a lot but that all changed when the meditation nation attacked? There was this underlying need for love and recognition through a relationship and this underlying want and... (read more)

Anthropic Commits To Model Weight Preservation

Jesper L.10h10

It's for research. They are not obsolete in that sense.

There are real benefits to keep studying these older models. And retrodictively track progress over time in areas undertested. And it's actually easier and safer to do certain things on them, that you cannot do on newer ones.

People Seem Funny In The Head About Subtle Signals

kbear10h50

slytherins, of course, are well known for unlayered, overt communication meant to be understood by all, making her subtlety twice ironic.

GradientDissenter's Shortform

Adam Zerner11h61

This resonates with me. I've always been a fan of Mr. Money Mustache's perspective that it doesn't take much money at all to live a really awesome life, which I think is similar to the perspective you're sharing.

Some thoughts:

Housing is huge. And living with friends is a huge help. But I think for a lot of people that isn't a pragmatic option (tied to an area; friends unwilling or incompatible; need privacy), and then they get stuck paying a lot for housing.
Going car free helps a lot. Unfortunately, I think most places in North America make this somewhat d

... (read more)

A 2032 Takeoff Story

Tech Ethics11h80

Thanks for writing this up!

This has given me the conviction to write up my scenario.

Here is a memo draft: https://www.lesswrong.com/posts/tp5ycrrkkHJ57sDTH/a-memo-on-takeoff

Geometric UDT

Adele Lopez11hΩ130

Oh cool!

We could call the non-nosy hypotheses "nice neighbors".

Seems like a bad name: "nice neighbors" don't care if everyone 'around' them is being tortured.

I've framed things in this post in terms of value uncertainty, but I believe everything can be re-framed in terms of uncertainty about what the correct prior is (which connects better with the motivation in my previous post on the subject).

Wait, do you think value uncertainty is equivalent/reducible to uncertainty about the correct prior? Would that mean the correct prior to use depends on your ... (read more)