LESSWRONG
LW

Dan H
3512Ω99163650
Message
Dialogue
Subscribe

ai-frontiers.org

newsletter.safe.ai

newsletter.mlsafety.org

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Cost-Effectiveness Models for AI Safety
Catastrophic Risks From AI
CAIS Philosophy Fellowship Midpoint Deliverables
Pragmatic AI Safety
No wikitag contributions to display.
Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies
Dan H4mo284

It's a great book: it's simple, memorable, and unusually convincing.

Reply
Good Research Takes are Not Sufficient for Good Strategic Takes
Dan H5mo34

If a strategy is likely to be outdated quickly it's not robust and not a good strategy. Strategies should be able to withstand lots of variation.

Reply
Zach Stein-Perlman's Shortform
Dan H7mo163

capability thresholds be vague or extremely high

xAI's thresholds are entirely concrete and not extremely high.

evaluation be unspecified or low-quality

They are specified and as high-quality as you can get. (If there are better datasets let me know.)

I'm not saying it's perfect, but I wouldn't but them all in the same bucket. Meta's is very different from DeepMind's or xAI's.

Reply2
Drake Thomas's Shortform
Dan H7mo1412

though I don't think xAI took an official position one way or the other

I assumed most of everybody assumed xAI supported it since Elon did. I didn't bother pushing for an additional xAI endorsement given that Elon endorsed it.

Reply1
meemi's Shortform
Dan H7mo*320

It's probably worth them mentioning for completeness that Nat Friedman funded an earlier version of the dataset too. (I was advising at that time and provided the main recommendation that it needs to be research-level because they were focusing on Olympiad level.)

Also can confirm they aren't giving access to the mathematicians' questions to AI companies other than OpenAI like xAI.

Reply
(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser
Dan H9mo102

and have clearly been read a non-trivial amount by Elon Musk

Nit: He heard this idea in conversation with an employee AFAICT.

Reply1
Darwinian Traps and Existential Risks
Dan H1y4-14

Relevant: Natural Selection Favors AIs over Humans

universal optimization algorithm

Evolution is not an optimization algorithm (this is a common misconception discussed in Okasha, Agents and Goals in Evolution).

Reply11
Unlearning via RMU is mostly shallow
Dan H1y*30

We have been working for months on this issue and have made substantial progress on it: Tamper-Resistant Safeguards for Open-Weight LLMs

General article about it: https://www.wired.com/story/center-for-ai-safety-open-source-llm-safeguards/

Reply
Re: Anthropic's suggested SB-1047 amendments
Dan H1y32

It's real.

Reply
An Introduction to Representation Engineering - an activation-based paradigm for controlling LLMs
Dan H1yΩ030

It's worth noting that activations are one thing you can modify, but many of the most performant methods (e.g., LoRRA) modify the weights. (Representations = {weights, activations}, hence "representation" engineering.)

Reply
Load More
5AISN #61: OpenAI Releases GPT-5
20d
0
6AISN #60: The AI Action Plan
1mo
0
10AISN #59: EU Publishes General-Purpose AI Code of Practice
2mo
0
6AISN #58: Senate Removes State AI Regulation Moratorium
2mo
0
6AISN #57: The RAISE Act
2mo
0
7AISN #56: Google Releases Veo 3
3mo
0
6AISN #55: Trump Administration Rescinds AI Diffusion Rule, Allows Chip Sales to Gulf States
3mo
1
8AISN #54: OpenAI Updates Restructure Plan
4mo
1
7AISN #53: An Open Letter Attempts to Block OpenAI Restructuring
4mo
0
6AISN#52: An Expert Virology Benchmark
4mo
0
Load More