LESSWRONG
LW

81
Jan Betley
1284Ω13012950
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
5Jan Betley's Shortform
7mo
39
megasilverfist's Shortform
Jan Betley1mo20

Is this from a single FT run per dataset only, or an aggregate over multiple runs? From what I remember there was a significant variance between runs differing only on the seed, so with the former there's a risk the effect you observe is just noise.

Reply
eggsyntax's Shortform
Jan Betley1mo20

Consider backdoors, as in the Sleeper Agents paper. So, a conditional policy triggered by some specific user prompt. You could probably quite easily fine-tune a recent model to be pro-life on even days and pro-choice on odd days. These would be just fully general, consistent behaviors, i.e. you could get a model that would present these date-dependant beliefs consistently among all possible contexts.

Now, imagine someone controls all of the environment you live in. Like, literally everything, except that they don't have any direct access to your brain. Could they implement similar backdoor in you? They could force you to behave that way, buy could they make you really believe that?

My guess is not, and one reason (there are also others but that's a different topic) is that humans like me and you have a very deep belief "current date doesn't make a difference for whether abortion is good and bad" that is extremely hard to overwrite without hurting our cognition in other contexts. Like, what is even good and bad if in some cases they flip at midnight?

So couldn't we have LLMs be like humans in this regard? I don't see a good reason for why this wouldn't be possible.

I'm not sure if this is a great analogy : )

Reply
eggsyntax's Shortform
Jan Betley1mo40

You could, I think, have a system where performance clearly depends on some key beliefs. So then you still could change the beliefs, but that change would significantly damage capabilities. I guess that could be good enough? E.g. I think if you somehow made me really believe the Earth is flat, this would harm my research skills. Or perhaps even if you made me e.g. hate gays.

Reply1
Jan Betley's Shortform
Jan Betley1mo20

Thx. I was thinking:

  • 1kg is roughly 7700 calories
  • I'm losing a bit more than 1kg per month
  • Deficit of 9k calories per month is 300 kcal daily

Please let me know if that doesn't make sense : )

Reply
Jan Betley's Shortform
Jan Betley1mo20

Sounds different. I never felt tired or low energy.

(I think I might have been eating close to 2k calories daily, but had plenty of activity, so the overall balance was negative)

Reply
Jan Betley's Shortform
Jan Betley1mo20

Hmm, I don't think so.

I never felt I've been undereating. Never felt any significant lack of energy. I was hiking, spending whole days at a music festival, cycling etc. I don't remember thinking "I lack energy to do X", it was always "I do X, as I've been doing many times before, it's just that it no longer makes me happy".

Reply1
Jan Betley's Shortform
Jan Betley1mo390

Anhedonia as a side-effect of semaglutide.

Anecdotal evidence only. I hope this might be useful for someone, especially that semaglutide is often considered a sort of miracle drug (and for good reasons). TL;DR:

  • I had pretty severe anhedonia for the last couple months
  • It started when I started taking semaglutide. I've never had anything like that before and I have no idea for possible other causes.
  • It mostly went away now that I decreased the dose.
  • There are other people on the internet claiming this is totally a thing

My experience with semaglutide

I've been taking Rybelsus (with medical supervision, just for weight loss, not diabetes). Started in the last days of December 2024 - 3mg for a month, 7mg for 2 months, then 14mg until 3 weeks ago when I went back to 7mg. This is, I think, a pretty standard path.

It worked great for weight loss - I went from 98kg to 87kg in 9 months with literally zero effort - I ate what I wanted, whenever I wanted, just ate less because I didn't want to eat as much as before. Also, almost no physiological side-effects.

I don't remember exactly when the symptoms started, but I think they were pretty signifiant around the beginning of March and didn't improve much until roughly a few days after I decreased the dose.

What I mean by anhedonia

First, I noticed that work is no longer fun (and it was fun for the previous 2 years). I considered burnout. But it didn't really look like burnout.
Then, I considered depression. But I had no other depression symptoms.
My therapist explicitly called it more than once "anhedonia with unknown causes" so this is not only a self-diagnosis.

Some random memories:

  • Waking up on Saturday thinking "What now. I can do so many things. I don't feeling like doing anything."
  • Doing things that always caused feeling of joy and pleasure (attending a concert, hiking, traveling in remote places etc) and thinking "what happened to that feeling, I should feel joy now".
    • More specific: this was really weird. Like, e.g. on a recent concert - I felt I really enjoy the music  on some level (had all the good stuff like "being fully there and focused on the performance", lasting feeling "this was better than expected" etc), it was only that the deep feeling of pleasure/joy was missing.
  • All my life I've always had something I wanted to do if I had more time - could be playing computer games, could be implementing a solution for ARC AGI, designing boardgames, recently mostly work. Not feeling that way was super weird.
  • Playing computer games that were always pretty addictive ("just one more round ... oops how is it now 3am?") with a feeling "meh, I don't care".

Other people claim similar things

See this reddit thread. You can also google "ozempic personality" - but I think this is rarely about just pure anhedonia. 

Some random thoughts

(NOTE: All non-personal observations here are low quality and an LLM with deep search will do better)

  • Most studies show GPL-1 agonists don't affect mood. But not all - see here.
    • (Not sure if makes sense) Losing weight is great. You are prettier and fit and this is something you wanted. So the mood should improve in some people - therefore perhaps null result in population implies negative effects on some other people?
  • I have ADHD. People with ADHD often have different dopamine pathways. Semaglutide affects dopamine neurons. So there's some chance these things are related. Also I think there are quite many ADHD reports in the reddit thread I linked above.
  • People claim it's easier to stop e.g. smoking or drinking while on semaglutide. So this suggests a general "I don't need things". This seems related.
Reply
Was Barack Obama still serving as president in December?
Jan Betley2mo20

Not everything replicates in Claudes, but some of the questions do. See here for examples.

Reply
Was Barack Obama still serving as president in December?
Jan Betley2mo30
  1. Not everything replicates in Claudes, only some of the questions do.
  2. You're using claude.ai. It has a very long system prompt that probably impacts many behaviors. I used the raw model, without any system prompt. See example printscreens from opus and sonnet.
Reply
Load More
129Was Barack Obama still serving as president in December?
2mo
16
60Concept Poisoning: Probing LLMs without probes
3mo
5
34Backdoor awareness and misaligned personas in reasoning models
5mo
8
53OpenAI Responses API changes models' behavior
7mo
6
15Are there any (semi-)detailed future scenarios where we win?
Q
7mo
Q
3
5Jan Betley's Shortform
7mo
39
26Finding Emergent Misalignment
7mo
0
83Open problems in emergent misalignment
8mo
17
333Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Ω
8mo
Ω
92
109Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs
1y
39
Load More