LESSWRONG
LW

506
Sodium
582Ω-39990
Message
Dialogue
Subscribe

Semi-anon account so I could write stuff without feeling stressed. 

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
3Sodium's Shortform
1y
29
Sodium's Shortform
Sodium3d10

Guess: most people who have gotten seriously interested in AI safety in the last year have not read/skimmed Risks From Learned Optimization.

Maybe 70% confident that this is true. Not sure how to feel about this tbh.

Reply
faul_sname's Shortform
Sodium16d10

Huh I don't see it :/

Reply
faul_sname's Shortform
Sodium17d10

Oh huh is this for pro users only. I don't see it (as a plus user). Nice.

Reply
faul_sname's Shortform
Sodium17d10

Is this with o3? I thought people lost access to o3 in chatgpt?

I repeated those two prompts with GPT-5 thinking and it did not bring up the word salad in either case: 

(special tokens)

(random tokens)

Reply
Claude Sonnet 4.5: System Card and Alignment
Sodium17d40

Feels worth noting that the alignment evaluation section is by far the largest section in the system card: 65 pages in total (44% of the whole thing). 

Here are the section page counts

  • Abstract — 4 pages (pp. 1–4)
  • 1 Introduction — 5 pages (pp. 5–9)
  • 2 Safeguards and harmlessness — 9 pages (pp. 10–18)
  • 3 Honesty — 5 pages (pp. 19–23)
  • 4 Agentic safety — 7 pages (pp. 24–30)
  • 5 Cyber capabilities — 14 pages (pp. 31–44)
  • 6 Reward hacking — 4 pages (pp. 45–48)
  • 7 Alignment assessment — 65 pages (pp. 49–113)
  • 8 Model welfare assessment — 9 pages (pp. 114–122)
  • 9 RSP evaluations — 25 pages (pp. 123–147)

 

The white box evaluation subsection (7.4) alone is 26 pages, longer than any other section!

Reply1
What GPT-oss Leaks About OpenAI's Training Data
Sodium22d40

Mandarin speakers will have understood that the above contains an unwholesome sublist of spammy and adult-oriented website terms, with one being too weird to make the list here. 

Lol unfortunately I am good enough at Mandarin to understand the literal meaning of those words but not good enough to immediately parse what they're about. I was like, "what on earth is '大香蕉网'" (lit. "Big Banana Website"), and then I googled it and clicked around and was like "ohhhhh that makes a lot of sense."

Reply
Natural Latents: Latent Variables Stable Across Ontologies
Sodium1mo100

People might be interested in the results from this paper. 

Reply
Reports Of AI Not Progressing Or Offering Mundane Utility Are Often Greatly Exaggerated
Sodium2mo60

Another piece of evidence that the AI is already having substantial labor market effects, Brynjolfsson et al.'s paper (released today!) shows that sectors that can be more easily automated by AI has seen less employment growth among young workers. For example, in Software engineering:

I think some of the effect here is mean reversion from overhiring in tech instead of AI-assisted coding. However, note that we see a similar divergence if we take out the information sector alltogether. In the graph below, we look at the employment growth among occupations broken up by how LLM-automateable they are. The light lines represent the change in headcount in low-exposure occupations (e.g., nurses) while the dark lines represent the change in headcount in high-exposure occupations (e.g., customer service representatives). 

We see that for the youngest workers, there appears to be a movement of labor from more exposed sectors to less exposed sectors. 

Reply
Aesthetic Preferences Can Cause Emergent Misalignment
Sodium2mo10

I knew it. The people who like Jeff Koons don’t just have poor taste—they’re evil. 

Reply6
Four ways learning Econ makes people dumber re: future AI
Sodium2moΩ3100

“Decade or so” is not the crux.

 

Ok yeah that's fair. 

Reply
Load More
27Xi Jinping's readout after an AI "study session" [ChinaTalk Linkpost]
5mo
1
21AI Can be “Gradient Aware” Without Doing Gradient hacking.
1y
1
35(Maybe) A Bag of Heuristics is All There Is & A Bag of Heuristics is All You Need
1y
17
58Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
1y
4
3Sodium's Shortform
1y
29
57John Schulman leaves OpenAI for Anthropic [and then left Anthropic again for Thinking Machines]
1y
0
18Four ways I've made bad decisions
1y
1
5(Non-deceptive) Suboptimality Alignment
2y
1
11NYT: The Surprising Thing A.I. Engineers Will Tell You if You Let Them
3y
2