Jonas Hallgren — LessWrong

AI Safety person currently working on multi-agent coordination problems.

I really like this direction of work, I think it is quite important to elucidate the connection between power-seeking systems and RL and a more generalised version of variational inference that can be applied to collectives.

It feels a bit like you did what the following post is pointing at in a better and more formal way, I thought it might be interesting to share it (to potentially help with some framings of how to explain it intuitively?): https://www.lesswrong.com/posts/KYxpkoh8ppnPfmuF3/power-seeking-minimising-free-energy

Looking forward to more in this area!

I think it's a fair suggestion that is adjacent, I do think the mechanisms are different enough that it's wrong though. Some of what we know of the mechanisms of dreaming and emotional regulation through sleep are gone through here (Dreams, Emotional regulation) and one of the questions there is to what extent yogic sleep is similar to REM sleep.

For your lucid dreaming angle, I would say the main dangerous thing is the inhibition of bodily action that leads to this spiral of anxiety when you can't move? (Sleep paralysis)

I'm like ~70% (50-90%) certain that this does not occur during yoga nidra and that yoga nidra is a technique that actually helps you if you've had these problems before.

I also read this book to get the vibe of it, it doesn't have the best epistemic rigour but the person writing it has a psychiatry practice specifically focused on yoga nidra and one of the main things that this person claims it helps with is PTSD and sleep related problems. I think it has a specific activation pattern that can be very healing if done correctly, if you're worried you can probably find a ACT psychologist or similar to do the practice with but I do think it is one of the safer practices you can do.

First and foremost, yes changing your brain in major ways is dangerous and you should be careful with what you're doing. I do think that there are safe ways of doing it and that it is dependent on the teachings that you follow and I go into more depth below. The main takeaway is basically that doing loving kindness and cultivating positive mindstates is a prerequisite to doing concentration practice if you're in a bad state of mind.

I'm basically repeating the takes of my teacher who's a thai forest tradition monk for like 30 years with some degrees in the background. He's been a bit of a rogue and tried out a bunch of different meditation practices and the thing that he recommends to people who might be at risk for negative experiences is awareness practice and loving kindness practice and I think this makes a lot of mechanistic sense. (Do take this with a grain of salt but this is my current best theory and I've not seen anyone go astray with this advice.)

The basic problem of general meditation is that it is focused on concentration. This is so that your mind can stabilise yet higher concentration states can generally lead to an amplification of existing emotions which can be a large problem if you have lots of negative emotions. This isn't necessarily the case for awareness practices and loving kindness practice as they induce different mental states for you. They're not about intensifying experience and only letting a small amount through, they're about expanding and seeing more. (see more on this model here)

So the advice is, if you're worried about the downside you can most likely safely do things like: Yoga Nidra, Loving Kindness practice or awareness practice as it is unlikely that you will be absorbed into negative states of mind even though you're coming from a worse state (since it's not absorption based!). It is generally the most direct path to healing and acceptance (imo) and it is what has helped my mother for example the most as she's a lot more calm and accepting of her current illnesses. My guess is that you could probably do up towards an hour or two a day of this practice without any problems at all. (especially yoga nidra and loving kindness practices)

A bit more detailed on sub points:

1. On the feasibility of there being different types of practice:

I did some research on this before and there's a bunch of interesting contemplative neuroscience out on the differences in activation in brain areas for meditation. Different brain areas are activated and this is something that is also mentioned in altered traits which is a pop science book on the science of meditation. (I did a quick report for a course a while back on this here which might have some interesting references in the end, the writing is kinda bad though (report) (here are the links that are most relevant in the underlying papers: paper 1, paper 2)

2. One of the main concerns later on in the practice is "The Dark Night of the Soul". According to my teacher this is more of a concept within practices that are based on the burmese tradition and through retreats that are focused on "dry" (meaning non-joyful) concentration like goenka vipassyana retreats and daniel ingram's books. One of the underlying things he has said is that there's a philosophical divide between the non-dual and theravadan styles of practicing about whether you "die" or whether your experience transforms into something that it already was (returning to the unborn) which can be quite important for changing your frame of self.

Also final recommendation is to do it alongside therapy for example something like ACT as it will then also tie you to reality more and it will allow both western and eastern healing to work on you in tandem!

Hopefully this might help somewhat? The basic idea is just to cultivate joy and acceptance before training your amplification abilities as positive emotions would be what is amplified instead.

I'm just coming back to this 4 years later and I'm realising that "Yup, this is what I studied", lmao.

It's a very fun toolkit to have and I really like having based it around applied mathematics and linear algebra, it feels very useful for modelling the world.

All that's left to see is whether it actually works or not for improving the world lol. (The one thing I've found extremely useful outside of this as a modelling tool is category theory as a more generalised way of describing generalisation but other than that I think this is like basically an exhaustive list of the most useful stuff.)

I was trying to find some references for this but it is common sense enough anyway.

From an active inference (or more generally bayesian) perspective you can view this process as finding a shared generative model to go from. So to reiterate what you said: "yes and" is good for improv as you say but if you have a core disagreement in your model and you do inference from that you're gonna end up being confused. (Meta: pointing at sameness)

I really like the idea of generating a core of "sameness" as a consequence. By finding the common ground in your models you can then start to deal with the things that you don't share and this usually leads to better results according to conflict resolution theory than going at it directly. So the "no because" only makes sense after a degree of sameness (which you can also have beforehand).
(Meta: Difference introduction from the sameness frame)

Nice I like it.

A random thing here as well is to have specific accounts focused on different algorithms. (The only annoying part is when you watch a gaming video on your well-trained research youtube but that's a skill issue.)

(The following is about a specific sub-point on the following part:)

If this is how they're seeing things, I guess I feel like I want to say another oops/sorry/thanks to the gradualists. ...And then double-click on why they think we have a snowball's chance in hell of getting this without a huge amount of restriction on the various frontier labs and way more competence/paranoia than we currently seem to have. My guess is that this, too, will boil down to worldview differences about competence or something. Still. Oops?

I think the point about the corrigibility basin being larger than thought is the thing that makes me more optimistic about alignment (only a 10-30% risk of dying!) and I thought you pointed that out quite well here. I personally don't think this is because of the competence of the labs but rather the natural properties of agentic systems (as I'm on your side when it comes to the competency of the labs). The following is some thinking of why and me trying to describe it in a way as well as me sharing some uncertainties about it.

I want to ask you why you think that the mathematical traditions that you're basing your work on as of the posts from a year ago (decision theory, AIXI) are representative of future agents? Why are we not trying the theories out on existing systems that get built into agents (biology for example)? Why should we condition more on decision theory than distributed systems theory?

The answer (imo) is to some extent around the VNM axioms and reflexive rationality and that biology is to ephemeral to build a basis on, yet it still seems like we're skipping out on useful information?

I think that there are places where biology might help you re-frame some of the thinking we do about how agents form.

More specifically, I want to point out OOD updating as something that biology makes claims about that are different from the traditional agent foundations model. Essentially, the biological frame implies something that is closer to a distributed system because it can cost a lot of energy to have a fully coordinated system due to costs of transfer learning that aren't worth it. (here's for example, a model of the costs of changing your mind: https://arxiv.org/pdf/2509.17957).

In that type of model, becoming a VNM agent is rather something that has an energy cost associated with it and it isn't clear it is worth it when you incorporate the amount of dynamic memory and similar that you would require to set this up. So it would seem to me that biology and agent foundations arrive at different models about the arising of VNM-agents and I'm feeling quite confused about it.

I also don't think I'm smart enough to figure out how to describe this from a fundamental decision theory way because it's a bit too difficult to me and so I was thinking that you might have an idea why taking biology more seriously doesn't make sense from a more foundational decision theory basis?

More specifically, does the argument about corrigibility being easier given non VNM-agents make sense?

Does the argument around VNM being more of a convergence property make sense?

And finally, I like the way you distilled the disagreement so thanks for that!

I'm wondering whether the spiritual attractor that we see in claude is somewhat because of the detail of instructions that exist within meditation to describe somatic and ontological states of being?

The language itself is a lot more embodied and is a lot closer to actual sensory experience compared to western philosophy and so when constructing a way to view the world, the most prevalent descriptions might make the most amount of sense to go down?

I'm noticing more and more how buddhist words are so extremely specific. For example of dukkha (unsatisfactoriness) is not just suffering it is unsatisfactoriness and ephemeral at the same time, it is pointing at a very specific view (prior model applied to sense data), a lot more specific than is usual within more western style of thinking?

Yeah, it is a different purpose and vibe compared to people going out and doing motivational interviewing in physical locations.

I guess there's a question here that is more relevant to people like some sort of empowerment framing or similar. "You might be worried about being disempowered by AI and you should be", we're here to help you answer your questions about it. Still serious but maybe more "welcoming" in vibes?

Warning that this comment is probably not very actionable but I thought I would share the vibe I got from the website. (as feedback is sometimes sparse)

Time will tell but part of me gets the vibe that you want to convince me of something when I go on the site (which you do) and as a consequence the vibe of the entire thing makes my guardrails already be up.

There's this entire thing about motivational interviewing when trying to change someone's mind that I¨'m reminded of. Basically, asking someone about their existing beliefs instead and then listening and questioning them later on when you've established sameness. The framing of the website is a bit like "I will convince you" rather than "I care about your opinion, please share" and so I'm wondering whether the underlying vibe could be better?

Hot take and might be wrong, I just wanted to mention it and best of luck!

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments