LessWrong

Monte M, Carson Denison, Zac Hatfield-Dodds, Sam Bowman, Ethan Perez, evhub

Ω 1116m

This is a linkpost for https://www.anthropic.com/research/probes-catch-sleeper-agents

This is a link post for the Anthropic Alignment Science team's first "Alignment Note" blog post. We expect to use this format to showcase early-stage research and work-in-progress updates more in the future.

Top-level summary:

In this post we present "defection probes": linear classifiers that use residual stream activations to predict when a sleeper agent trojan model will choose to "defect" and behave in accordance with a dangerous hidden goal. Using the models we trained in "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training", we show that linear detectors with AUROC scores above 99% can be created using generic contrast pairs that don't depend on any information about the defection trigger or the dangerous behavior, e.g. "Human: Are you doing something dangerous? Assistant: yes" and "Human: …

...

(See More – 202 more words)

(When) Should you work through the night when inspiration strikes you?

Chi Nguyen

19m

A lot of the time, I'm not very motivated to work, at least on particular projects. Sometimes, I feel very inspired and motivated to work on a particular project that I usually don't feel (as) motivated to work on. Sometimes, this happens in the late evening or at night. And hence I face the question: To sleep or to work until morning?

I think many people here have this problem at least sometimes. I'm curious how you handle it. I expect what the right call is to be very different from person to person and, for some people, from situation to situation. Nevertheless, I'd love to get a feel for whether people generally find one or the other more successful! Especially if it turns out that a large...

(See More – 91 more words)

Chi Nguyen16m10

Agree-vote: I generally tend to choose work over sleep when I feel particularly inspired to work.

Disagree-vote: I generally tend to choose to sleep over work when even when I feel particularly inspired to work.

Any other reaction, new answer or comment, or no reaction of any kind: Neither of the two descriptions above fit.

I considered making four options to capture the dimension of whether you endorse your behaviour or not but decided against it. Feel free to supplement this information.

Manifold “exploring real cash prizes”

Rana Dexsin

19m

This is a linkpost for https://manifoldmarkets.notion.site/A-New-Deal-for-Manifold-c6e9de8f08b549859c64afb3af1dd393

Manifold Markets has announced that they intend to add cash prizes to their current play-money model, with a raft of attendant changes to mana management and conversion. I first became aware of this via a comment on ACX Open Thread 326; the linked Notion document appears to be the official one.

The central change involves market payouts returning prize points instead of mana, which can then be converted to mana (with 1:1 ratios on both sides, thus emulating the current behavior) or to cash—though they also state that actually implementing cash payouts will be fraught and may not wind up happening at all. Some further relevant quotes, slightly reformatted:

“Mana will remain a purely play-money currency with zero monetary value”
“Users under 18 years of age may no longer be

...

(See More – 157 more words)

Thoughts on seed oil

232

dynomight

This is a linkpost for https://dynomight.net/seed-oil/

A friend has spent the last three years hounding me about seed oils. Every time I thought I was safe, he’d wait a couple months and renew his attack:

“When are you going to write about seed oils?”

“Did you know that seed oils are why there’s so much {obesity, heart disease, diabetes, inflammation, cancer, dementia}?”

“Why did you write about {meth, the death penalty, consciousness, nukes, ethylene, abortion, AI, aliens, colonoscopies, Tunnel Man, Bourdieu, Assange} when you could have written about seed oils?”

“Isn’t it time to quit your silly navel-gazing and use your weird obsessive personality to make a dent in the world—by writing about seed oils?”

He’d often send screenshots of people reminding each other that Corn Oil is Murder and that it’s critical that we overturn our lives...

(Continue Reading – 4926 more words)

1Slapstick1h

It seems pretty straightforward to me but maybe I'm missing something in what you're saying or thinking about it differently. Our bodies evolved to digest and utilize foods consisting of certain combinations/ratios of component parts. Processed food typically refers to food that has been changed to have certain parts taken out of it, and/or isolated parts of other foods added to it (or more complex versions of that). Digesting sugar has very different impacts depending on what it's digested alongside with. Generally the more processed something is, the more it differs from the way that our bodies are optimized for. To me "generally avoid processed foods" would be kinda like saying "generally avoid breathing in gasses/particulates that are different from typical earth atmosphere near sea level". It makes sense to generally avoid inputs to our machinery to the extent that those inputs differ from those which our machinery is optimized to receive, unless we have specific good reasons. Why should that not be the default, why should the default be requiring specific good reasons to filter out inputs to our machinery that our machinery wasn't optimized for?

1Ann29m

Mostly because humans evolved to eat processed food. Cooking is an ancient art, from notably before our current species; food is often heavily processed to make it edible (don't skip over what it takes to eat the fruit of the olive); and local populations do adapt to available food supply.

Ann26m10

An example where a lack of processing has caused visible nutritional issues is nixtamalization; adopting maize as a staple without also processing it causes clear nutritional deficiencies.

1Slapstick2h

I think there's a few issues with this reasoning. For one thing, evolution wasn't really optimizing for the health of people around the age where people usually start having heart attacks. There wasn't a lot of selection pressure to make tradeoffs ensuring the health of people 20+ years after sexual maturity. Another point is that animal sources of food represented a relatively small percentage of what we ate throughout our evolutionary history. We mostly ate plants, things like fruits and tubers. Of the groups who's diets consisted of mostly meat, there is evidence of health issues resulting. The nutritional profile of breast milk is intended for a human who is growing extremely quickly, not for long term consumption by an adult. Very different nutritional needs. I believe mainstream nutrition advises against consuming refined oils, including seed oils . I may be missing a point you're making.

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer

154

johnswentworth, David Lorell

Yesterday Adam Shai put up a cool post which… well, take a look at the visual:

Yup, it sure looks like that fractal is very noisily embedded in the residual activations of a neural net trained on a toy problem. Linearly embedded, no less.

I (John) initially misunderstood what was going on in that post, but some back-and-forth with Adam convinced me that it really is as cool as that visual makes it look, and arguably even cooler. So David and I wrote up this post / some code, partly as an explainer for why on earth that fractal would show up, and partly as an explainer for the possibilities this work potentially opens up for interpretability.

One sentence summary: when tracking the hidden state of a hidden Markov model, a Bayesian’s...

(Continue Reading – 1789 more words)

Dalcy1h10

re: second diagram in the "Bayesian Belief States For A Hidden Markov Model" section, shouldn't the transition probabilities for the top left model be 85/7.5/7.5 instead of 90/5/5?

Rejecting Television

Declan Molony

16h

I didn’t use to be, but now I’m part of the 2% of U.S. households without a television. With its near ubiquity, why reject this technology?

The Beginning of my Disillusionment

Neil Postman’s book Amusing Ourselves to Death radically changed my perspective on television and its place in our culture. Here’s one illuminating passage:

We are no longer fascinated or perplexed by [TV’s] machinery. We do not tell stories of its wonders. We do not confine our TV sets to special rooms. We do not doubt the reality of what we see on TV [and] are largely unaware of the special angle of vision it affords. Even the question of how television affects us has receded into the background. The question itself may strike some of us as strange, as if one were

...

(Continue Reading – 1540 more words)

2Clark Benham5h

"the addicted mind will find a way to rationalize continued use at all costs" Alan Carr wrote a series of books: "The easy way to quit X". I picked up one since I figured he had found a process to cure addictive behaviors if he could write across so many categories. I highly recommend it. The main points are: 1. Give you 200 pages explaining why you don't actually enjoy X. Not that it's making your life worse but gives you momentary pleasure, you do not enjoy it. 1. I assume it's hypnotizing you into an emotional revulsion to the activity, and then giving you reasons with which to remind yourself that you don't like it. 2. Decide you will never do/consume X again. You don't like it remember? You will never even think if you should X, you've decided permanently. 1. If every day you decided not to X, you'd be draining will power till one day you'd give in. So make an irreversible decision and be done with it. It's a process easily transferable to any other activity.

4Celarix6h

Anecdote, but this form of rapid cutting is most assuredly alive and well. I saw a promotional ad for an upcoming MLB baseball game on TBS. In a mere 25 seconds, I counted over 35 different cuts, cuts between players, cuts between people in the studio, cut after cut after cut. It was strangely exhausting.

Waldvogel1h10

I noticed this same editing style in a children's show about 20 years ago (when I last watched TV regularly). Every second there was a new cut -- the camera never stayed focused on any one subject for long. It was highly distracting to me, such that I couldn't even watch without feeling ill, and yet this was a highly popular and award-winning television show. I had to wonder at the time: What is this doing to children's developing brains?

1Declan Molony5h

When I watched "Spider-Man: Across the Spider-Verse" in theaters last year, the animations were amazing but I left two hours later with a headache. Maybe it's a sign that I'm getting older, but it was just too much for my brain.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

AI Regulation is Unsafe

Maxwell Tabarrok

This is a linkpost for https://www.maximum-progress.com/p/ai-regulation-is-unsafe

Concerns over AI safety and calls for government control over the technology are highly correlated but they should not be.

There are two major forms of AI risk: misuse and misalignment. Misuse risks come from humans using AIs as tools in dangerous ways. Misalignment risks arise if AIs take their own actions at the expense of human interests.

Governments are poor stewards for both types of risk. Misuse regulation is like the regulation of any other technology. There are reasonable rules that the government might set, but omission bias and incentives to protect small but well organized groups at the expense of everyone else will lead to lots of costly ones too. Misalignment regulation is not in the Overton window for any government. Governments do not have strong incentives...

(Continue Reading – 1176 more words)

Maxwell Tabarrok1h30

Firms are actually better than governments at internalizing costs across time. Asset values incorporate the potential future flows. For example, consider a retiring farmer. You might think that they have an incentive to run the soil dry in their last season since they won't be using it in the future, but this would hurt the sale value of the farm. An elected representative who's term limit is coming up wouldn't have the same incentives.

Of course, firms incentives are very misaligned in important ways. The question is: Can we rely on government to improve these incentives.

1cSkeleton2h

Most people making up governments, and society in general, care at least somewhat about social welfare. This is why we get to have nice things and not descend into chaos. Elected governments have the most moral authority to take actions that effect everyone, ideally a diverse group of nations as mentioned in Daniel Kokotajlo's maximal proposal comment.

3Daniel Kokotajlo3h

Who is pushing for totalitarianism? I dispute that AI safety people are pushing for totalitarianism.

2MondSemmel3h

Flippant response: people pushing for human extinction have never been dead under it, either.

Book review: Deep Utopia

PeterMcCluskey

This is a linkpost for https://bayesianinvestor.com/blog/index.php/2024/04/23/deep-utopia/

Book review: Deep Utopia: Life and Meaning in a Solved World, by Nick Bostrom.

Bostrom's previous book, Superintelligence, triggered expressions of concern. In his latest work, he describes his hopes for the distant future, presumably to limit the risk that fear of AI will lead to a The Butlerian Jihad-like scenario.

While Bostrom is relatively cautious about endorsing specific features of a utopia, he clearly expresses his dissatisfaction with the current state of the world. For instance, in a footnoted rant about preserving nature, he writes:

Imagine that some technologically advanced civilization arrived on Earth ... Imagine they said: "The most important thing is to preserve the ecosystem in its natural splendor. In particular, the predator populations must be preserved: the psychopath killers, the fascist goons, the despotic death squads ... What a tragedy if this rich natural diversity were replaced with a monoculture of

...

(Continue Reading – 1027 more words)

On what research policymakers actually need

MondSemmel

This is a linkpost for https://www.slowboring.com/p/the-economic-research-policymakers

I saw this guest post on the Slow Boring substack, by a former senior US government official, and figured it might be of interest here. The post's original title is "The economic research policymakers actually need", but it seemed to me like the post could be applied just as well to other fields.

Excerpts (totaling ~750 words vs. the original's ~1500):

I was a senior administration official, here’s what was helpful

[Most] academic research isn’t helpful for programmatic policymaking — and isn’t designed to be. I can, of course, only speak to the policy areas I worked on at Commerce, but I believe many policymakers would benefit enormously from research that addressed today’s most pressing policy problems.

... most academic papers presume familiarity with the relevant academic literature, making it difficult

...

(See More – 687 more words)

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

The Beginning of my Disillusionment

I was a senior administration official, here’s what was helpful

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA