Former community director EA Netherlands. Now disabled by long covid, ME/CFS. Worried about AGI & US democracy
More thoughts:
I thought that AlphaZero was a counterpoint, but apparently it's significantly different. For example, it used true self-play allowing it to discover fully novel strategies.
Then again, I don't think more sophisticated reasoning is the bottleneck to AGI (compared to executive function & tool use), so even if reasoning doesn't really improve for a few years we could get AGI.
However, I previously thought reasoning models could be leveraged to figure out how to achieve actions, and then the best actions would be distilled into a better agent model, you know, IDA-style. But this paper makes me more skeptical of that working, because these agentic steps might require novel skills that aren't inside the training data.
Yes it matters for current model performance, but it means that RLVR isn't actually improving the model in a way that can be used for an iterated distillation & amplification loop, because it doesn't actually do real amplification. If this turns out right, it's quite bearish for AI timelines
Edit: Ah someone just alerted me to the crucial consideration that this was tested using smaller models (like Qwen-2.5 (7B/14B/32B) and LLaMA-3.1-8B, which are significantly smaller than the models where RLVR has shown the most dramatic improvements (like DeepSeek-V3 → R1 or GPT-4o → o1). And given that different researchers have claimed that there's a threshold effect, substantially weakens these findings. But they say they're currently evaluating DeepSeek V3- & R1 so I guess we'll see
That's good to know.
For what it's worth, ME/CFS (a disease/cluster of specific symptoms) is quite different from idiopathic chronic fatigue (a single symptom). Confusing the two is one of the major issues in the literature. Many people with ME/CFS, like I, don't even have 'feeling tired' as a symptom. Which is why I avoid the term CFS.
I haven't looked into this literature, but it sounds remarkably similar to the literature of cognitive behavioral therapy and graded exercise therapy for ME/CFS (also sometimes referred to as 'chronic fatigue syndrome'). I can imagine this being different for pain which could be under more direct neurological control.
Pretty much universally, this research was of low to very low quality. For example, using overly broad inclusion criteria such that many patients did not have the core symptom of ME/CFS, and only reporting subjective scores (which tend to improve) while not reporting objective scores. These treatments are also pretty much impossible to blind. Non-blinding + subjective self-report is a pretty bad combination. This, plus the general amount of bad research practices in science, gives me a skeptical prior.
Regarding the value of anecdotes - over the past couple of years as ME/CFS patient (presumably from covid) I've seen remission anecdotes for everything under the sun. They're generally met with enthusiasm and a wave of people trying it, with ~no one being able te replicate it. I suspect that "I cured my condition X psychologically" is often a more prevalent story because 1) it's tried so often, and 2) it's an especially viral meme. Not because it has a higher succes rate than a random supplement. The reality is that spontaneous remission for any condition seems not extremely unlikely, and it's actually very hard to trace effects to causes (which is why even for effective drugs, we need large-scale highly rigorous trials).
Lastly, ignoring symptoms can be pretty dangerous so I recommend caution with the approach and approach is like you would any other experimental treatment.
I'm starting a discussion group on Signal to explore and understand the democratic backsliding of the US at ‘gears-level’. We will avoid simply discussing the latest outrageous thing in the news, unless that news is relevant to democratic backsliding.
Example questions:
“how far will SCOTUS support Trump's executive overreach?”
“what happens if Trump commands the military to support electoral fraud?”
"how does this interact with potentially short AGI timelines?”
"what would an authoritarian successor to Trump look like?"
"are there any neglected, tractable, and important interventions?"
You can join the group here: https://signal.group/#CjQKIE2jBWwjbFip5-kBnyZHqvDnxaJ2VaUYwbIpiE-Eym2hEhAy21lPlkhZ246_AH1V4-iA (If the link doesn't work anymore in the future, DM me.)
One way to operationalize "160 years of human time" is "thing that can be achieved by a 160-person organisation in 1 year", which seems like it would make sense?
This makes me wonder if it's possible that "evil personas" can be entirely eliminated from distilled models, by including positive/aligned intent labels/traces throughout the whole distillation dataset
Seems to me the name AI safety is currently still widely used, no? As it covers much more than just alignment strategies, by including also stuff like control and governance
The AI Doomers are only one of several factions that oppose AI and seek to cripple it via weaponized regulation.
Bad faith
There are also factions concerned about “misinformation” and “algorithmic bias,” which in practice means they think chatbots must be censored to prevent them from saying anything politically inconvenient.
Bad faith
AI Doomer coalition abandoned the name “AI safety” and rebranded itself to “AI alignment.”
Seems wrong
This might be a stupid question, but has anyone considered just flooding LLM training data with large amounts of (first-person?) short stories of desirable ASI behavior?
The way I imagine this to work is basically that an AI agent would develop really strong intuitions that "that's just what ASIs do". It might prevent it from properly modelling other agents that aren't trained on this, but it's not obvious to me that that's going to happen or that it's such a decisively bad thing to outweigh the positives