Consider backdoors, as in the Sleeper Agents paper. So, a conditional policy triggered by some specific user prompt. You could probably quite easily fine-tune a recent model to be pro-life on even days and pro-choice on odd days. These would be just fully general, consistent behaviors, i.e. you could get a model that would present these date-dependant beliefs consistently among all possible contexts.
Now, imagine someone controls all of the environment you live in. Like, literally everything, except that they don't have any direct access to your brain. Could they implement similar backdoor in you? They could force you to behave that way, buy could they make you really believe that?
My guess is not, and one reason (there are also others but that's a different topic) is that humans like me and you have a very deep belief "current date doesn't make a difference for whether abortion is good and bad" that is extremely hard to overwrite without hurting our cognition in other contexts. Like, what is even good and bad if in some cases they flip at midnight?
So couldn't we have LLMs be like humans in this regard? I don't see a good reason for why this wouldn't be possible.
I'm not sure if this is a great analogy : )
You could, I think, have a system where performance clearly depends on some key beliefs. So then you still could change the beliefs, but that change would significantly damage capabilities. I guess that could be good enough? E.g. I think if you somehow made me really believe the Earth is flat, this would harm my research skills. Or perhaps even if you made me e.g. hate gays.
Thx. I was thinking:
Please let me know if that doesn't make sense : )
Sounds different. I never felt tired or low energy.
(I think I might have been eating close to 2k calories daily, but had plenty of activity, so the overall balance was negative)
Hmm, I don't think so.
I never felt I've been undereating. Never felt any significant lack of energy. I was hiking, spending whole days at a music festival, cycling etc. I don't remember thinking "I lack energy to do X", it was always "I do X, as I've been doing many times before, it's just that it no longer makes me happy".
Anecdotal evidence only. I hope this might be useful for someone, especially that semaglutide is often considered a sort of miracle drug (and for good reasons). TL;DR:
I've been taking Rybelsus (with medical supervision, just for weight loss, not diabetes). Started in the last days of December 2024 - 3mg for a month, 7mg for 2 months, then 14mg until 3 weeks ago when I went back to 7mg. This is, I think, a pretty standard path.
It worked great for weight loss - I went from 98kg to 87kg in 9 months with literally zero effort - I ate what I wanted, whenever I wanted, just ate less because I didn't want to eat as much as before. Also, almost no physiological side-effects.
I don't remember exactly when the symptoms started, but I think they were pretty signifiant around the beginning of March and didn't improve much until roughly a few days after I decreased the dose.
First, I noticed that work is no longer fun (and it was fun for the previous 2 years). I considered burnout. But it didn't really look like burnout.
Then, I considered depression. But I had no other depression symptoms.
My therapist explicitly called it more than once "anhedonia with unknown causes" so this is not only a self-diagnosis.
Some random memories:
See this reddit thread. You can also google "ozempic personality" - but I think this is rarely about just pure anhedonia.
(NOTE: All non-personal observations here are low quality and an LLM with deep search will do better)
Is this from a single FT run per dataset only, or an aggregate over multiple runs? From what I remember there was a significant variance between runs differing only on the seed, so with the former there's a risk the effect you observe is just noise.