LESSWRONG
LW

Siebe
3055390
Message
Dialogue
Subscribe

Former community director EA Netherlands. Now disabled by long covid, ME/CFS. Worried about AGI & US democracy 

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
3Siebe's Shortform
6mo
29
Siebe's Shortform
Siebe6mo165

This might be a stupid question, but has anyone considered just flooding LLM training data with large amounts of (first-person?) short stories of desirable ASI behavior?

The way I imagine this to work is basically that an AI agent would develop really strong intuitions that "that's just what ASIs do". It might prevent it from properly modelling other agents that aren't trained on this, but it's not obvious to me that that's going to happen or that it's such a decisively bad thing to outweigh the positives

Reply
Siebe's Shortform
Siebe6d11

Shallow take:

I feel iffy about negative reinforcement still being widely used in AI. Both human behaviour experts (child-rearing) and animal behavior experts seem to have largely moved away from that being effective, only leading to unwanted behavior down the line

Reply
Siebe's Shortform
Siebe19d60

There's a number of priors that lead me to expect much of the current AI safety research to be low quality:

  1. A lot of science is low quality. It's the default expectation for a research field.

  2. It's pre-paradigmatic. Norms haven't been established yet for what works in the real world, what are reliable methods and what is p-hacking etc. This makes it not only difficult to produce good work, it also makes it hard to recognize bad work and hard to get properly calibrated about how much work is bad, the way we are in established research fields.

  3. It's subject to selection effects by non-experts. It gets amplified by advocates, journalists, policy groups, the general public. This incentivizes hype, spin etc. over rigor.

  4. It's a very ideological field. Because there's not a lot of empirical evidence to go on, and a lot of people's opinions were formed before LLMs exploded, and people's emotions are (rightly) strong about the topic.

  5. I'm part of the in-group and I identify with - sometimes even know - the people doing the research. All tribal biases apply.

Now, some of this may be attenuated by the field being inspired by LessWrong and therefore having some norms like research integrity, open discussion & high criticism, but I don't think those forces are strong enough to counteract the other ones.

If you believe "AI safety is fundamentally much harder than capabilities, and therefore we're in danger", you should also believe "AI safety is fundamentally much harder than capabilities, and therefore there's a lot of invalid and unreliable claims".

Also, this will vary for different subfields. Those with tighter connection to real-world outcomes, like interpretability, I would expect to be less bad. But I'm not familiar enough with the subfields to say more about specific ones.

Reply
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Siebe3mo10

More thoughts:

I thought that AlphaZero was a counterpoint, but apparently it's significantly different. For example, it used true self-play allowing it to discover fully novel strategies.

Then again, I don't think more sophisticated reasoning is the bottleneck to AGI (compared to executive function & tool use), so even if reasoning doesn't really improve for a few years we could get AGI.

However, I previously thought reasoning models could be leveraged to figure out how to achieve actions, and then the best actions would be distilled into a better agent model, you know, IDA-style. But this paper makes me more skeptical of that working, because these agentic steps might require novel skills that aren't inside the training data.

Reply
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Siebe3mo*80

Yes it matters for current model performance, but it means that RLVR isn't actually improving the model in a way that can be used for an iterated distillation & amplification loop, because it doesn't actually do real amplification. If this turns out right, it's quite bearish for AI timelines

Edit: Ah someone just alerted me to the crucial consideration that this was tested using smaller models (like Qwen-2.5 (7B/14B/32B) and LLaMA-3.1-8B, which are significantly smaller than the models where RLVR has shown the most dramatic improvements (like DeepSeek-V3 → R1 or GPT-4o → o1). And given that different researchers have claimed that there's a threshold effect, substantially weakens these findings. But they say they're currently evaluating DeepSeek V3- & R1 so I guess we'll see

Reply1
Learned pain as a leading cause of chronic pain
Siebe3mo30

That's good to know.

For what it's worth, ME/CFS (a disease/cluster of specific symptoms) is quite different from idiopathic chronic fatigue (a single symptom). Confusing the two is one of the major issues in the literature. Many people with ME/CFS, like I, don't even have 'feeling tired' as a symptom. Which is why I avoid the term CFS.

Reply
Learned pain as a leading cause of chronic pain
Siebe3mo181

I haven't looked into this literature, but it sounds remarkably similar to the literature of cognitive behavioral therapy and graded exercise therapy for ME/CFS (also sometimes referred to as 'chronic fatigue syndrome'). I can imagine this being different for pain which could be under more direct neurological control.

Pretty much universally, this research was of low to very low quality. For example, using overly broad inclusion criteria such that many patients did not have the core symptom of ME/CFS, and only reporting subjective scores (which tend to improve) while not reporting objective scores. These treatments are also pretty much impossible to blind. Non-blinding + subjective self-report is a pretty bad combination. This, plus the general amount of bad research practices in science, gives me a skeptical prior.

Regarding the value of anecdotes - over the past couple of years as ME/CFS patient (presumably from covid) I've seen remission anecdotes for everything under the sun. They're generally met with enthusiasm and a wave of people trying it, with ~no one being able te replicate it. I suspect that "I cured my condition X psychologically" is often a more prevalent story because 1) it's tried so often, and 2) it's an especially viral meme. Not because it has a higher succes rate than a random supplement. The reality is that spontaneous remission for any condition seems not extremely unlikely, and it's actually very hard to trace effects to causes (which is why even for effective drugs, we need large-scale highly rigorous trials).

Lastly, ignoring symptoms can be pretty dangerous so I recommend caution with the approach and approach is like you would any other experimental treatment.

Reply
Siebe's Shortform
Siebe3mo30

I'm starting a discussion group on Signal to explore and understand the democratic backsliding of the US at ‘gears-level’. We will avoid simply discussing the latest outrageous thing in the news, unless that news is relevant to democratic backsliding.

Example questions:

  • “how far will SCOTUS support Trump's executive overreach?”

  • “what happens if Trump commands the military to support electoral fraud?”

  • "how does this interact with potentially short AGI timelines?”

  • "what would an authoritarian successor to Trump look like?"

  • "are there any neglected, tractable, and important interventions?"

You can join the group here: https://signal.group/#CjQKIE2jBWwjbFip5-kBnyZHqvDnxaJ2VaUYwbIpiE-Eym2hEhAy21lPlkhZ246_AH1V4-iA (If the link doesn't work anymore in the future, DM me.)

Reply
METR: Measuring AI Ability to Complete Long Tasks
Siebe4mo2-15

One way to operationalize "160 years of human time" is "thing that can be achieved by a 160-person organisation in 1 year", which seems like it would make sense?

Reply
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Siebe4mo70

This makes me wonder if it's possible that "evil personas" can be entirely eliminated from distilled models, by including positive/aligned intent labels/traces throughout the whole distillation dataset

Reply
Load More
10Matthew Yglesias - Misinformation Mostly Confuses Your Own Side
5mo
1
3Siebe's Shortform
6mo
29
25Your risk of developing long COVID is probably high
3y
16
24A case for strategy research: what it is and why we need more of it
6y
19
3Welcome to Effective Altruism Groningen
7y
0